Adeel Qureshi/

E-Commerce Data


E-Commerce Data

This dataset consists of orders made in different countries from December 2010 to December 2011. The company is a UK-based online retailer that mainly sells unique all-occasion gifts. Many of its customers are wholesalers.

Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd


Data Dictionary

InvoiceNoA 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c' it indicates a cancellation.
StockCodeA 5-digit integral number uniquely assigned to each distinct product.
DescriptionProduct (item) name
QuantityThe quantities of each product (item) per transaction
InvoiceDateThe day and time when each transaction was generated
UnitPriceProduct price per unit in sterling (pound)
CustomerIDA 5-digit integral number uniquely assigned to each customer
CountryThe name of the country where each customer resides

Source of dataset.

Citation: Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

  • 🗺️ Explore: Negative order quantities indicate returns. Which products have been returned the most?
  • 📊 Visualize: Create a plot visualizing the profits earned from UK customers weekly, monthly, and quarterly.
  • 🔎 Analyze: Are order sizes from countries outside the United Kingdom significantly larger than orders from inside the United Kingdom?

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

You are working for an online retailer. Currently, the retailer sells over 4000 unique products. To take inventory of the items, your manager has asked you whether you can group the products into a small number of categories. The categories should be similar in terms of price and quantity sold and any other characteristics you can extract from the data.

You will need to prepare a report that is accessible to a broad audience. It should outline your motivation, steps, findings, and conclusions.

✍️ If you have an idea for an interesting Scenario or Challenge, or have feedback on our existing ones, let us know! You can submit feedback by pressing the question mark in the top right corner of the screen and selecting "Give Feedback". Include the phrase "Content Feedback" to help us flag it in our system.

  • AI Chat
  • Code