Revving Up Success: A Sales Report for a Motorcycle Parts Company
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner
    Hidden code

    📕 Overview

    The motorcycle parts industry is an increasingly saturated market, due to the rising popularity of motorcycles as the primary mode of transportation and riding as a leasure activity. In order to remain competitive, businesses must leverage data analysis to gain inights and drive strategic decision-making. In this research report we thus explore the sales data of a motorycle parts company operating three warehouses in a large metropolitan area. The goal is to provide valuable insights to a colleague, leveraging data manipulation and visualization methods, who seeks to analyze sales patterns and usage preferences of the company's different product lines and payment methods.

    The research objectives are the following:

    1. Investigate the sales performance of each product line.
    2. Explore the purchasing behavior and preferences of different client types.
    3. Evaluate sales performance across the three warehouses.
    4. To evaluate and determine the most utilized payment method among customers.
    5. Analyze the sales data over time to identify patterns, trends, and seasonality.

    The sales data used for the analysis contains the following fields:

    • "date" - The date, from June to August 2021.
    • "warehouse" - The company operates three warehouses: North, Central, and West.
    • "client_type" - There are two types of customers: Retail and Wholesale.
    • "product_line" - Type of products purchased.
    • "quantity" - How many items were purchased.
    • "unit_price" - Price per item sold.
    • "total" - Total sale = quantity * unit_price.
    • "payment" - How the client paid: Cash, Credit card, Transfer.

    The report is structured into three distinct sections: an exploratory data analysis section, a main analysis section, and a final section for conclusions and recommendations.

    📗 Exploratory Data Analysis

    The purpose of this section is to gain familiarity with the motorcycle parts sales data and acquire a preliminary understanding of its characteristics. Below we can see the first five rows of the dataset, with the headers adapted for greater readability:

    Hidden code

    The first step is to check the dataset for any dirty/missing data. From the table below, we can see that the dataset doesn't contain any null values and all data types are assigned correctly. There are five categorical variables — date, warehouse, client type, product line, and payment type — and only three numerical variables: quantity, unit price, and total revenue (quantity * unit price).

    Hidden code

    Analyzing categorical variables

    Using descriptive statistics, we can understand the composition and distribution of different groups or categories within the sales dataset.

    Hidden code

    From the date column we can see that the dataset is incomplete, containing motorcycle part sales only for the summer months, from June 1st to August 28th. Any insights drawn may thus be subject to seasonality; in order to make any generalizations related to the rest of the year, data collected in the previous/following months is required. Although this out of scope, it is important to keep in mind these considerations when reading the report.

    Moving on, 1000 sales have been completed in these three months, with the warehouse that sold the most parts being centrally located (48%), and with Retail customers being the ones who purchase with greatest frequency (78% of transactions). Most transactions are done using a credit card (66%), and the most popular product line is the braking system (23%). Given that payment type, warehouse location, and product line are variables that have more than two unique categories, it is best to visualize them using countplots to see if these inter-category differences are pronounced:

    Hidden code

    When it comes to payment type, the countplot confirms a large discrepancy between categories; an overwhelming majority of clients prefer to pay by card. The same cannot be said for warehouse location and product line, wherein although the centrally-located warehouse has the most sales, the difference with the warehouse located north is of 14 p.p., while, when it comes to the different product lines offered, the number of braking systems sold (230) is only slightly greater than that of the suspension and traction systems (228).

    Analyzing numerical variables

    Having analyzed the categorical variables, we can now move on to using summary statistics to analyze quantity, unit price, and total revenue.

    Hidden code

    The largest transaction registered in the dataset is of 40 motorcycle parts, while the smallest is of one. The average quantity sold is nine parts, with significant variability (std = 10). Meanwhile, the cheapest motorcycle part comes to ten dollars, and the most expensive to 67 dollars, with a moderate level of variability (std=12). Finally, when it comes to total revenue, the average transaction amounts to 289 dollars, with the smallest being 10 dollars and the largest 2.546 dollars. Also this variable experiences high variability, with a std of 345 dollars, most likely influenced by the variability of quantity and unit price.

    To determine which of the two variables has the greatest influence, we can examine — as part of an initial assessment — their respective coefficient of variation (CV). Namely, quantity has a greater relative variability compared to its mean (111.11%) than unit price (40%) does, suggesting that quantity is most likely the culprit of such a high variability in total revenue. Nevertheless, it is important to identify whether such a high variability in these variables is due to a presence of extreme values, via a boxplot analysis: