Beta
Understanding Lego sets popularity
📖 Background
We are going use the provided dataset to understand the popularity of different Lego sets and themes. The idea is to become familiarized with the data to be ready for an interview with a business stakeholder.
We will be utilizing PostgreSQL queries to perform data analysis and gain valuable insights from our data set. This will enable us to make informed decisions and improve our operations.
💾 The data
We got a database with the following tables. You can also see above a visualization of how the tables are related to each other. (source):
inventory_parts
- "inventory_id" - id of the inventory the part is in (as in the inventories table)
- "part_num" - unique id for the part (as in the parts table)
- "color_id" - id of the color
- "quantity" - the number of copies of the part included in the set
- "is_spare" - whether or not it is a spare part
parts
- "part_num" - unique id for the part (as in the inventory_parts table)
- "name" - name of the part
- "part_cat_id" - part category id (as in part_catagories table)
part_categories
- "id" - part category id (as in parts table)
- "name" - name of the category the part belongs to
colors
- "id" - id of the color (as in inventory_parts table)
- "name" - color name
- "rgb" - rgb code of the color
- "is_trans" - whether or not the part is transparent/translucent
inventories
- "id" - id of the inventory the part is in (as in the inventory_sets and inventory_parts tables)
- "version" - version number
- "set_num" - set number (as in sets table)
inventory_sets
- "inventory_id" - id of the inventory the part is in (as in the inventories table)
- "set_num" - set number (as in sets table)
- "quantity" - the quantity of sets included
sets
- "set_num" - unique set id (as in inventory_sets and inventories tables)
- "name" - the name of the set
- "year" - the year the set was published
- "theme_id" - the id of the theme the set belongs to (as in themes table)
- "num_parts" - the number of parts in the set
themes
- "id" - the id of the theme (as in the sets table)
- "name" - the name of the theme
- "parent_id" - the id of the larger theme, if there is one
Acknowledgments: Rebrickable.com
1. What is the average number of Lego sets released per year?
Unknown integration
DataFrameavailable as
df
variable
WITH set_count_per_year AS (
SELECT year, COUNT(set_num) AS set_count
FROM sets
GROUP BY year
)
SELECT AVG(set_count) AS avg_sets_per_year
FROM set_count_per_year;
/* This query first uses a Common Table Expression (CTE)
to group the sets by year and count the number of sets per year.
Then it uses the AVG function to calculate the average number of
sets across all the years. */
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
2. What is the average number of Lego parts per year?
Unknown integration
DataFrameavailable as
avg_num_parts
variable
SELECT
year,
AVG(num_parts) as avg_num_parts
FROM sets
GROUP BY year
ORDER BY AVG(num_parts) DESC, year;
/* The query will return the average number of Lego parts
for each year and it will be ordered by the average number of parts
in descending order and then by year. */
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
3. Create a visualization for item 2.
Current Type: Bar
Current X-axis: year
Current Y-axis: avg_num_parts
Current Color: None
The average number of Lego parts per year
4. What are the 5 most popular colors used in Lego parts?
Unknown integration
DataFrameavailable as
popular_colors
variable
WITH color_count AS
(SELECT color_id,
COUNT(color_id) AS color_count
FROM inventory_parts
GROUP BY color_id)
SELECT c.name,
cc.color_count
FROM color_count cc
JOIN colors c ON cc.color_id = c.id
ORDER BY cc.color_count DESC
LIMIT 5;
/* This query first uses a Common Table Expression (CTE) to group the parts by color_id
and count the number of parts per color. Then it joins the color_count CTE with the colors table
to retrieve the name of the color. Finally, it sorts the results by color_count in descending order
and limits the results to the top 5. This query will give us the 5 most popular colors
and the count of parts used in that color. */
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
5. What proportion of Lego parts are transparent?
Unknown integration
DataFrameavailable as
df
variable
WITH transparent_parts AS (
SELECT COUNT(DISTINCT ip.part_num)::float as transparent_parts
FROM inventory_parts ip
LEFT JOIN colors c ON ip.color_id = c.id
WHERE c.is_trans = true
),
all_parts AS (
SELECT COUNT(DISTINCT part_num)::float as all_parts
FROM inventory_parts
)
SELECT (transparent_parts/all_parts * 100)::numeric(5,1) || '%' as percentage_transparent
FROM transparent_parts, all_parts;
/* This query first uses two Common Table Expressions (CTEs) to count the number of distinct parts
that are transparent and the number of all distinct parts. By using the '::float' operator, it ensures
that the columns data type are converted to float. Then it selects the proportion of transparent parts
by dividing the number of transparent parts by the total number of parts, then it multiplies it by 100,
rounds it to 1 decimal place and concatenates the percent sign. */
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.