Competition - motorcycle parts
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Reporting on sales data

    Now let's now move on to the competition and challenge.

    📖 Background

    You work in the accounting department of a company that sells motorcycle parts. The company operates three warehouses in a large metropolitan area.

    You’ve recently learned data manipulation and plotting, and suggest helping your colleague analyze past sales data. Your colleague wants to capture sales by payment method. She also needs to know the average unit price for each product line.

    💾 The data

    The sales data has the following fields:
    • "date" - The date, from June to August 2021.
    • "warehouse" - The company operates three warehouses: North, Central, and West.
    • "client_type" - There are two types of customers: Retail and Wholesale.
    • "product_line" - Type of products purchased.
    • "quantity" - How many items were purchased.
    • "unit_price" - Price per item sold.
    • "total" - Total sale = quantity * unit_price.
    • "payment" - How the client paid: Cash, Credit card, Transfer.

    💪 Challenge

    Create a report to answer your colleague's questions. Include:

    1. What are the total sales for each payment method?
    2. What is the average unit price for each product line?
    3. Create plots to visualize findings for questions 1 and 2.
    4. [Optional] Investigate further (e.g., average purchase value by client type, total purchase value by product line, etc.)
    5. Summarize your findings.

    ✅ Checklist before publishing

    • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
    • Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
    • Check that all the cells run without error.

    ⌛️ Time is ticking. Good luck!

    
    # Importing the pandas module
    import pandas as pd
    
    # Reading in the sales data
    df = pd.read_csv('data/sales_data.csv', parse_dates=['date'])
    
    # Take a look at the first datapoints
    df.head()
    # What are the total sales for each payment method?
    df.groupby('payment')[['total']].sum()
    # What is the average unit price for each product line?
    df.groupby('product_line')[['unit_price']].mean()
    # Create plot to visualize findings for questions 1 and 2.
    
    import matplotlib.pyplot as plt
    import numpy as np
    fig = plt.figure()
    
    ## Show the total sales for each payment method graph
    total_sales = df.groupby('payment')[['total']].sum()
    total_sales.plot(kind = 'barh')
    plt.xlabel('Total Sale')
    plt.ylabel('Payment')
    plt.title("Payment by Total Sales")
    plt.show()
    ## Show the average unit price for each product line
    avg_unit_price = df.groupby('product_line')[['unit_price']].mean()
    avg_unit_price.plot(kind = 'barh')
    plt.xlabel('Unit Price')
    plt.ylabel('Product Line')
    plt.title("Product Line by Average Unit Price")
    plt.show()
    # [optional] Investigate further (e.g., average purchase value by client type, total purchase value by product line, etc.)
    ## Show the average purchase value by client type
    df.groupby('client_type')[['total']].mean()
    avg_purchase_value = df.groupby('client_type')[['total']].mean()
    avg_purchase_value.plot(kind = 'barh')
    plt.xlabel('Purchase Value')
    plt.ylabel('Client Type')
    plt.title("Client Type by Average Purchase Value")
    plt.show
    ## Show the total purchase value by product line
    df.groupby('product_line')[['total']].sum()
    total_purchase_value = df.groupby('product_line')[['total']].sum()
    total_purchase_value.plot(kind = 'barh')
    plt.xlabel('Purchase Value')
    plt.ylabel('Product Line')
    plt.title("Product Line by Total Purchase Value")
    plt.show()
    ## Show the total quantity for each product line
    df.groupby('product_line')[['quantity']].sum()