Workspace
Oscar Herrada/

2017 Lego's Top 10s (SQL + Tableau)

0
Beta
Spinner

2017 Lego's TOP 10s



The idea behind the project was to practice SQL + Tableau with the Lego Database and create Top 10 lists with Lego sets from 2017, accounting for only non-spare parts and non-Minifigure sets.



There is a discrepancy with the Visual results compared to the Queried results. SQL queries that are run here cannot return full results over a certain size. To create the Tableau Visuals, the data was redownloaded from the link below and run with the same queries. The updated set contained more Lego Sets than there are within the DataCamp tables introducing a difference in results.

A copy of the origial Schema used is provided early and the later Schema used for visuals is displayed at the bottom of this project.



The data used can be found here: https://rebrickable.com/downloads/



    The first Dahsboard consist of:
  • Top 10 Parts, Part-Categories, and Colors used in all sets.
  • Top 10 Sets based on total parts.
  • Top 10 Sets based on unique lego parts. (Size = Lego Count, Color = Uniqueness)



    The Second Dashboard consist of:
  • Top 10 Ranks for Part-Categories.
  • Top 10 Ranks for Colors.

(Color = Lego Count)

What was interesting among these two visualizations was how there seems to be an on-purpose distribution ensuring Lego Set's get a specified amount of categories and colors in its design.

Set Names are not shown to focus on the distribution of both variables.



The Final Dahsboard shows a Square Plot based on the Top 10 Lego Sets of 2017 ranked by 'Creativity Score'.

(Size = Creativity Score, Color = Lego Count)

Creativity Score is a value (score) based on the uniqueness, color, and categorical variation of a Set. It should have been the colorbar unfortunately Set "Our Town"'s' title would not appear in that format.

The neat thing about this table is that there is a set for every age group from 3 years of age (Our Town) and up.



Here is a breakdown to the schema designed to hold the Lego data

The idea behind the project was to gather the pieces by theme using "themes" as my main table to collect all available pieces with a theme, part number, category, and color.



Unknown integration
DataFrameavailable as
df
variable
select *
from themes
where id = 502
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
select *
from themes
where name = 'The Hobbit'
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
select *
from sets
where name = 'The Hobbit'
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
The 3 queries above show that the tables "themes" and "sets" do not have matching id to theme_id pairs as visible with the theme 'The Hobbit'. This removes the task of getting the theme name which will be replaced with the set name.



    Breakdown of SQL sections(---)

  • Table that holds all Sets, their Parts along with the Part-Categories and Colors named "sets_and_parts"
  • Table of basic stats on "sets_and_parts"
  • TOP 10s for Part Names, Part-Categories, and Colors
  • TOP 10s by Sets: Lego Count , Unique Legos, Category Count, Color Count
  • Top 10 Creative Lego Sets based on Unique Piece, Color, and Categorical Variation
  • Unknown integration
    DataFrameavailable as
    df
    variable
    /*
    -- Create a table that holds all sets, their parts along with the parts' categories and colors
    
    
    -- CTE to hold Sets and their Parts
    with parts_with_sets as (select s.name as set_name,
                             	s.year as year,
                             	s.num_parts as num_parts,
                             	i.version as version,
                             	ip.part_num as part_num,
                             	ip.color_id as color_id,
                             	ip.quantity as piece_count,
                             	ip.is_spare as is_spare
                             from sets as s
                             left join inventories as i
                             on s.set_num = i.set_num
                             left join inventory_parts as ip
                             on i.id = ip.inventory_id),
    
    
    -- CTE to hold the Names and Categories to Set Parts
    pws_part_names_and_categories as (select set_name,
                                     	year,
                                     	num_parts,
                                     	version,
                                     	color_id,
                                      	piece_count,
                                     	is_spare,
                                     	p.name as part_name,
                                     	pc.name as part_category
                                     from parts_with_sets as pws
                                     left join parts as p
                                     on pws.part_num = p.part_num
                                     left join part_categories as pc
                                     on p.part_cat_id = pc.id),
    
    
    -- CTE to hold aggregated tables and create base to query from titled "lego_sets"
    sets_and_parts as (select set_name,
                      	year,
                      	num_parts,
                      	version,
                      	c.name as color_name,
                       	piece_count,
                      	part_name,
                      	part_category
                      from pws_part_names_and_categories as pws_pnac
                      left join colors as c
                      on pws_pnac.color_id = c.id                                                    
                      where year = 2017 and is_spare = False and part_category not like '%Minifig%') --, remove comma for later CTEs
    */
                      
    ----------------------------------------------------------------------------------------------------------------------------------
    
    -- BASE STATS
    
    
    /* 
    -- Number of Sets - 287
    select count(distinct(set_name)) 
    from sets_and_parts;
    */
    
    
    /*
    -- Total Lego Count - 73,213
    select sum(piece_count)
    from sets_and_parts;
    */
    
    
    /*
    -- Total Number of Unique Legos - 22,923 (unique by the set, counts again if it appears in another set)
    select count(*)
    from sets_and_parts;
    */
    
    
    /*
    -- Total Count for Unique Part Categories -
    select count(*)
    from (select distinct part_category
    from sets_and_parts) as unique_cats;
    */
    
    
    /*
    -- Number of Colors - 66
    select count(*)
    from (select distinct color_name
    from sets_and_parts) as color_count;
    */
    
    ----------------------------------------------------------------------------------------------------------------------------------
    
    -- TOP 10s for part names, categories, and colors
    
    
    /*
    -- Top 10 Most Distributed Lego Pieces
    select part_name,
    	count(part_name) as count
    from sets_and_parts
    group by part_name
    order by count desc
    limit 10;
    */
    
    
    /*
    -- Top 10 Most Distributed Color
    select color_name,
    	count(color_name) as count
    from sets_and_parts
    group by color_name
    order by count desc
    limit 10;
    */
    
    
    /*
    -- Top 10 Most Distributed Category 
    select part_category,
    	count(part_category) as count
    from sets_and_parts
    group by part_category
    order by count desc
    limit 10;
    */
    
    
    /*
    -- Most Colorful Lego Pieces
    select part_name,
    	count(color_count) as colors
     from (select part_name,
    	count(color_name) as color_count
    from sets_and_parts
    group by part_name, color_name) as pc
    group by part_name
    order by colors desc
    limit 10;
    */
    
    
    /*
    -- Most Colorful Lego Pieces
    colorful_legos as (select part_name,
                            count(color_count) as colors
                        from (select part_name,
                            count(color_name) as color_count
                	      	from sets_and_parts
                        group by part_name, color_name) as pc
                        group by part_name
                        order by colors desc
                        limit 10)
    
    select cl.part_name,
    	sap.part_category as category,
        colors
    from colorful_legos as cl
    left join sets_and_parts as sap
    on cl.part_name = sap.part_name
    group by cl.part_name, sap.part_category, cl.colors
    order by colors desc;
    */
    
    ----------------------------------------------------------------------------------------------------------------------------------
    
    -- TOP 10s by Sets
    
    
    /*
    -- Top 10 Lego Sets by Lego Count 
    select set_name,
    	version,
    	sum(piece_count) as total_legos
    from sets_and_parts
    group by set_name, version
    order by total_legos desc
    limit 10;
    */
    
    
    /*
    -- Top 10 Lego Sets by Unique Legos  
    select set_name,
    	version,
    	sum(case when piece_count > 0 then 1 end) as unique_legos
    from sets_and_parts
    group by set_name, version
    order by unique_legos desc
    limit 10;
    */
    
    
    /*
    -- Top 10 Lego Sets by Category Count 
    select set_name,
    	version,
    	count(category_count) as category_count
    from (select distinct set_name,
    	version,
        count(part_category) as category_count
    from sets_and_parts
    group by set_name, version, part_category) as sets_and_categories
    group by set_name, version
    order by category_count desc
    limit 10;
    */
    
    
    /*
    -- Top 10 Lego Sets by Color Count
    select set_name,
    	version,
    	count(color_count) as color_count
    from (select distinct set_name,
    	version,
        count(color_name) as color_count
    from sets_and_parts
    group by set_name, version, color_name) as sets_and_colors
    group by set_name, version
    order by color_count desc
    limit 10;
    */
    
    ----------------------------------------------------------------------------------------------------------------------------------
    
    -- Top 10 Creative Lego Sets based on piece, color, and categorical variation
    
    
    /*
    -- CTE on Sets by Count
    lego_set_count as (select set_name,
                       		version,
                       		sum(piece_count) as total_legos
                       from sets_and_parts
                       group by set_name, version
                       order by total_legos desc),
    
    
    -- CTE on Sets by Unique
    lego_set_unique as (select set_name,
                        	version,
                        	sum(case when piece_count > 0 then 1 end) as unique_legos
                        from sets_and_parts
                        group by set_name, version
                        order by unique_legos desc),
    
    
    -- CTE on Sets by Category
    lego_set_categories as (select set_name,
                            	version,
                            	count(category_count) as category_count
                            from (select distinct set_name,
                                  				version,
                                  				count(part_category) as category_count
                                  from sets_and_parts
                                  group by set_name, version, part_category) as sets_and_categories
                            group by set_name, version
                            order by category_count desc),
    
    
    -- CTE on Sets by Color
    lego_set_colors as (select set_name,
                        	version,
                        	count(color_count) as color_count
                        from (select distinct set_name,
                              		version,
                              		count(color_name) as color_count
                              from sets_and_parts
                              group by set_name, version, color_name) as sets_and_colors
                        group by set_name, version
                        order by color_count desc),
    
    
    -- CTE to hold aggregated tables and create base to query from titled "lego_sets"
    lego_sets as (select lsc.set_name as set_name,
                  		lsc.version as version,
                  		total_legos,
                  		unique_legos,
                  		category_count,
                  		color_count
                  from lego_set_count as lsc
                  left join lego_set_unique as lsq
                  on lsc.set_name = lsq.set_name
                  left join lego_set_categories as lscat
                  on lsq.set_name = lscat.set_name
                  left join lego_set_colors as lscol
                  on lscat.set_name = lscol.set_name)
    */
                  
             
    /*
    -- Table to show that sets with 1-3 pieces consisted of 37 sets and no consistent variance after that
    select total_legos,
    	max(total_legos_count) as times_occured
    from (select total_legos,
          	count(total_legos) total_legos_count
          from lego_sets
          group by total_legos) as stat_collect
    group by total_legos
    order by times_occured desc
    */
    
    
    /*
    -- Basic Stats on "lego_sets" pieces to find measures of central tendency - Mean will be CT
    select max(total_legos) as max,
    	min(total_legos) as min,
    	round(avg(total_legos), 0) as mean,
    	(select total_legos as median 
         from (select distinct total_legos
               from lego_sets
               order by total_legos desc
               limit (select count(total_legos)/2 from lego_sets)) as median_index
         order by total_legos
         limit 1) as median,
         (select concat('1-3 count: ', cast(sum(times_occured) as varchar)) as set_piece_count_range
          from (select total_legos,
                	max(total_legos_count) as times_occured
                from (select total_legos,
                      	count(total_legos) total_legos_count
                      from lego_sets
                      group by total_legos) as stat_collect
                group by total_legos
                order by times_occured desc
                limit 3) as mode)
    from lego_sets
    */
    
    
    /* 
    -- Top 10 Lego Sets by Creativity Score
    select set_name,
    	version,
        total_legos,
        creativity_score 
    from (select set_name,
          	version,
          	total_legos,
          	round((part_variance + categorical_variance + color_variance)::numeric) as creativity_score
          from (select set_name,
                	version,
                	total_legos,
                	total_legos::float/unique_legos as part_variance,
                	unique_legos::float/category_count as categorical_variance,
                	unique_legos::float/color_count as color_variance
                from lego_sets
                where total_legos > 254) as creative) as scaling
    order by creativity_score desc
    limit 10;
    */
    
    ----------------------------------------------------------------------------------------------------------------------------------
    
    -- FULL TABLES WERE EXTRACTED THROUGH BIG QUERY
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.