Improving Retail Business

date: The date in a "yyyy-mm-dd" format
wm_yr_wk: The id of the week the date belongs to
weekday: The type of the day (Saturday, Sunday, ... , Friday)
wday: The id of the weekday, starting from Saturday
month: The month of the date
year: The year of the date
d: The number of date from 2011-01-29 to 2016-06-19
event_name_1: If the date includes an event, the name of this event
event_type_1: If the date includes an event, the type of this event
event_name_2: If the date includes an event, the name of this event
event_type_2: If the date includes an event, the type of this event
snap_CA: A binary variable (0 or 1) indicating the stores of CA allow snap purchases on the examined date
snap_TX: A binary variable (0 or 1) indicating the stores of TX allow snap purchases on the examined date
snap_WI: A binary variable (0 or 1) indicating the stores of WI allow snap purchases on the examined date SNAP stands for Supplemental Nutrition Assistance Program

sell_prices: contains information about the price of the products sold per store and date

store_id: The id of the store where the product is sold
item_id: The id of the product
wm_yr_wk: The id of the week the date belong to
sell_price: The price of the products

sales: contains the historical daily unit sales data per product and store

id: Indicate the item_id and store_id are evaluated
item_id: The id of the product
dept_id: The id of the department the product belong to
cat_id: The id of the category the product belongs to
store_id: The id of the store where the product is sold
state_id: The state where the store is located
d_1,d_2,...,d_1941: The number of units sold at day ..., starting from 2011-01-29

Step 1: Check the data

I have checked the dataset by using Excel and Power BI as the lists below.

Typos/data entry errors
Outliers
Data types for all columns So in the Python notebook, I check the number of rows and columns and some missing values.


import numpy as np
import pandas as pd
#Import the sales.csv, sell_prices.csv, and calendar.csv file
sales_table = pd.read_csv("sales.csv")
price_table = pd.read_csv("sell_prices.csv")
calendar_table = pd.read_csv("calendar.csv")
print(price_table)
print(calendar_table)
print(sales_table)

print(sales_table.head(4))
print(sales_table.info()) 
#Many Nulls in event_name_1, event_name_2, event_type_1, and event_type_2 columns
print(calendar_table.info())
print(price_table.info())

Step 2: Explore Data

I explore data by questioning the dataset to find insight.

2.1 What is the store's maximum number of sales by sum every day (from 2011-01-29 to 2016-05-22)?

The CA_3 store has the maximum number of sales by sum every day at 11,363,540 units, and this also affects the sum of sales in CA state, which is higher than in other states.

import matplotlib.pyplot as plt
import pandas as pd

sales = pd.read_csv("sales.csv")

# Group by "store_id" and sum the "d_1" to "d_1941" columns
grouped_sales = sales.groupby("store_id").sum()

# Indicate x and y for plotting scatter chart
x_point = grouped_sales.index
y_point = grouped_sales.loc[:, "d_1":"d_1941"].sum(axis=1)

# Plot bar chart
plt.bar(range(len(x_point)), y_point)
plt.xlabel("Store ID")
plt.ylabel("The sum of sales every day")

# Adjust x-axis labels
plt.xticks(range(len(x_point)), x_point, fontsize=6)  # Adjust fontsize to make x-axis labels smaller

plt.show()

print(grouped_sales)
print(y_point)

2.2 What is the department's maximum number of sales by sum every day (from 2011-01-29 to 2016-05-22)?

The FOOD_3 department has the maximum number of sales by sum every day at 32,937,002 units, and this also affects the sum of sales in items in the department, which is higher than in other items.

import matplotlib.pyplot as plt
import pandas as pd

sales = pd.read_csv("sales.csv")

# Group by "dept_id" and sum the "d_1" to "d_1941" columns
grouped_sales = sales.groupby("dept_id").sum()

# Indicate x and y for plotting scatter chart
x_point = grouped_sales.index
y_point = grouped_sales.loc[:, "d_1":"d_1941"].sum(axis=1)

# Plot bar chart
plt.bar(range(len(x_point)), y_point)
plt.xlabel("Department ID")
plt.ylabel("The sum of sale every day")

# Adjust x-axis labels
plt.xticks(range(len(x_point)), x_point, fontsize=6)  # Adjust fontsize to make x-axis labels smaller

plt.show()

print(grouped_sales)
print(y_point)

2.3 If the FOODS_3 department has the maximum sale unit value, when is the maximum frequency people in three states like buying food?

After I randomized the results, almost every day people like buying the food, which probably matches when the government assists the people with SNAP.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

sales = pd.read_csv("sales.csv")

def find_max_sum(grouped_sales):
    # Group by "dept_id" and sum the "d_1" to "d_1941" columns
    grouped_sales = sales.groupby("dept_id").sum()

    # Calculate the sum of each column
    sum_per_column = grouped_sales.loc[grouped_sales.index == "FOODS_3", "d_1":"d_1941"].sum()

    # Find the maximum value and its corresponding column name
    max_sum = sum_per_column.nlargest(30)
    max_column = sum_per_column.idxmax()

    return max_column, max_sum

max_column, max_sum = find_max_sum(sales)
max_column, max_sum

import pandas as pd

calendar = pd.read_csv("calendar.csv")

# Find the rows where d is equal to d_954, d_1934, d_961, and d_1136
matching_rows = calendar[calendar["d"].isin(["d_954", "d_1934", "d_961", "d_1136", "d_948", "d_960", "d_953","d_940","d_982","d_1892","d_1884","d_1864","d_1164","d_981","d_1617","d_947","d_1898","d_1010","d_1252","d_1261","d_492","d_499","d_1835","d_946","d_435","d_1079","d_1177","d_1941","d_1080","d_926"])]

# Select the d and SNAP_CA, SNAP_TX, SNAP_WI columns from the matching rows
matching_values = matching_rows[["d", "snap_CA", "snap_TX", "snap_WI"]]

matching_values

From the dataset, The dates that the government assists people in each state are

CA State 1st–10th of every month
TX State 1st–15th of every month
WI State 2nd–15th of every month

‌
‌
‌

Improving Retail Business

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Retail Store Sales

Metadata

calendar: contains information about the dates on which the products are sold