Workspace
Laura Hendrickx/

Understanding Lego Sets Popularity [Competition]

0
Beta
Spinner

Understanding Lego Sets Popularity

📖 Background

Lego barely needs any introduction. The variously colored interlocking plastic bricks accompanying an array of gears, figurines called minifigures, and various other parts have remained a big hit since they were first released in 1949.

As of July 2015, 600 billion Lego parts had been produced. This notebook focuses on exploring some data to provide some statistics regarding Lego.

Research questions

Using PostgreSQL and Jupyter Notebook, with Python for visualisation, following questions will be answered:

  1. What is the average number of Lego sets released per year over the total time span of our data?
  2. What is the average number of Lego parts over total time span and per year?
  3. What are the 5 most popular colors used in Lego parts?
  4. What proportion of Lego parts are transparent?
  5. What are the 5 rarest categories of Lego bricks?
  6. A summary of the findings.

💾 The data

Database Description

inventory_parts

  • "inventory_id" - id of the inventory the part is in (as in the inventories table)
  • "part_num" - unique id for the part (as in the parts table)
  • "color_id" - id of the color
  • "quantity" - the number of copies of the part included in the set
  • "is_spare" - whether or not it is a spare part

parts

  • "part_num" - unique id for the part (as in the inventory_parts table)
  • "name" - name of the part
  • "part_cat_id" - part category id (as in part_catagories table)

part_categories

  • "id" - part category id (as in parts table)
  • "name" - name of the category the part belongs to

colors

  • "id" - id of the color (as in inventory_parts table)
  • "name" - color name
  • "rgb" - rgb code of the color
  • "is_trans" - whether or not the part is transparent/translucent

inventories

  • "id" - id of the inventory the part is in (as in the inventory_sets and inventory_parts tables)
  • "version" - version number
  • "set_num" - set number (as in sets table)

inventory_sets

  • "inventory_id" - id of the inventory the part is in (as in the inventories table)
  • "set_num" - set number (as in sets table)
  • "quantity" - the quantity of sets included

sets

  • "set_num" - unique set id (as in inventory_sets and inventories tables)
  • "name" - the name of the set
  • "year" - the year the set was published
  • "theme_id" - the id of the theme the set belongs to (as in themes table)
  • num-parts - the number of parts in the set

themes

  • "id" - the id of the theme (as in the sets table)
  • "name" - the name of the theme
  • "parent_id" - the id of the larger theme, if there is one

Acknowledgments

The data was provided by Rebrickable.com and be accessed through LEGO Catalog Database Downloads

Database Schema

The relationships between the tables in the Lego database are shown in the schema below.

📈 Analysis

1. What is the average number of Lego sets released per year over the total time span of our data?

To calculate the average number of Lego sets released per year, we'll follow these steps :

  • determine the total amount of sets (aliased here as total_sets),
  • determine the total amount of years (aliased here as total_years),
  • calcute the average by dividing the total amount of sets by the total amount of years.
Unknown integration
DataFrameavailable as
df
variable
SELECT 
    S.total_sets/Y.total_years AS avg_sets_per_year,
    S.total_sets,
    Y.total_years
FROM
    -- calculate total sets
    (SELECT 
         COUNT(set_num) AS total_sets
         FROM sets) AS S, 
    -- calculate tot years
    (SELECT 
         COUNT(DISTINCT year) AS total_years
         FROM sets) AS Y;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

2. What is the average number of Lego parts over total time span and per year?

2.1 What is the average number of Lego parts per year over the total time span of our data?

We can calculate the average number of Lego parts as follows :

  • determine the total amount of parts (aliased here as total_parts),
  • determine the total amount of years (aliased here as total_years),
  • calculate the average by dividing the total amount of parts by the total amount of years.
Unknown integration
DataFrameavailable as
df
variable
SELECT 
    S.total_parts/Y.total_years AS average_number_of_parts,
    S.total_parts,
    Y.total_years
FROM
    -- calculate total parts
    (SELECT 
         SUM(num_parts) AS total_parts
         FROM sets) AS S,
    -- calculate total years
    (SELECT 
         COUNT(DISTINCT year) AS total_years
         FROM sets) AS Y;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
2.2 What is the average number of Lego parts per year?

We can calculate this average number of Lego parts as follows :

  • determine the average amount of parts
  • round to a full number
  • group per year and order per year to see the evolution throughout the years.
Unknown integration
DataFrameavailable as
avg_no_per_year
variable
SELECT 
    year, 
   AVG(num_parts)as avg_num_parts 
FROM sets 
GROUP BY year
ORDER BY year;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
#import plotly
import plotly.express as px

#customized and interactive bar plot
fig = px.bar(avg_no_per_year,
                 x="year",
                 y="avg_num_parts",
                 title="<b>Average number of Lego parts per year</b>",
            labels={"year": "Year of Production", "avg_num_parts": "Average Number of Parts"})
           	  

# Show the plot
fig.show()

3. What are the 5 most popular colors used in Lego parts?

To calculate the 5 most popular colors in Lego, we must join the 'inventory_parts' table with the 'colors' table.

Unknown integration
DataFrameavailable as
popular_lego_colors
variable
SELECT C.name AS color, COUNT(*)
-- join inventory parts with color
FROM inventory_parts AS IP
JOIN colors AS C
ON IP.color_id = C.id
-- displays total counts by color name
GROUP BY C.name
ORDER BY count DESC
-- select only top 5 colors
LIMIT 5;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
#customized and interactive bar plot
fig = px.bar(popular_lego_colors,
                 x="color",
                 y="count",
                 title="<b>Most popular Lego Colors</b>",
            labels={"color": "Color", "count": "Amount of Parts"})
           	  

# Show the plot
fig.show()