Case Study Office Supplies
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Case Study Project - Office Supplies

    Company Background

    Pens & Printers is a national office supplies chain. At the moment, they send office supplies out of warehouses in four regions: East, West, South, and Central, and all four warehouses stock the same products.
    The Head of Sales thinks this leads to large amounts of unsold products in some locations.

    Customer Question

    The management would like you to answer the following:
    ● Are there products that do not sell as well in some locations?
    ● Are there any other patterns over time in each region that you can find in the data

    Dataset

    Column nameDetails
    Order IDCharacter. Unique identifier for the individual order.
    Order DateCharacter. Date of the order, in format YYYY-MM-DD.
    Ship ModeCharacter. The method used to send out the order.
    RegionCharacter. The region the order was sent from.
    Product IDCharacter. Unique identifier of the product ordered.
    CategoryCharacter. Category of the product, one of ‘Office Supplies’,‘Furniture’, or ‘Technology’.
    Sub-CategoryCharacter. Subcategory of the product (e.g. Binders, Paper, etc.)
    Product NameCharacter. The name of the product.
    SalesNumeric. Total value of the products sold in the order.
    QuantityNumeric. Quantity of the products in the order.
    DiscountNumeric. Discount of the order in decimal form. (e.g. 0.30 indicates the order has a 30% discount, etc.)
    ProfitNumeric. Profit of the order.

    Getting the data

    # Load packages
    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    import numpy as np
    %matplotlib inline 
    
    # Set the style to use for remaining plots
    sns.set_style("dark")
    
    # Read in the data
    df_raw = pd.read_csv("office_supplies.csv", parse_dates=['Order Date'])

    Inspecting the data

    df_raw.shape
    df_raw.describe()
    df_raw['Order Date'].min()
    df_raw['Order Date'].max()
    df_raw.head(3)

    Exploring data types

    df_raw.info()

    Exploring categorical features

    def plot_cat_count(df, title):    
        fig, ax = plt.subplots(nrows=1, ncols=4, figsize=(15, 5))
        fig.suptitle(title)
    
        g1 = sns.countplot(data=df, y='Ship Mode', ax=ax[0], order=df['Ship Mode'].value_counts().index)
        g1.set(title='Ship Mode', xlabel=None, ylabel=None)
     
        g2 = sns.countplot(data=df, y='Region', ax=ax[1], order=df['Region'].value_counts().index)
        g2.set(title='Region', xlabel=None, ylabel=None)
    
        g3 = sns.countplot(data=df, y='Category', ax=ax[2], order=df['Category'].value_counts().index)
        g3.set(title='Category', xlabel=None, ylabel=None)
    
        g4 = sns.countplot(data=df, y='Sub-Category', ax=ax[3], order=df['Sub-Category'].value_counts().index)
        g4.set(title='Sub Category', xlabel=None, ylabel=None)
        plt.subplots_adjust(wspace=0.5)
    plot_cat_count(df_raw, 'Counts of observations in categorical features')
    df_columns = df_raw.columns
    df_raw.columns