Competition - Lego sets
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT * 
    FROM sets
    LIMIT 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Data analysis example:

    Find the average number of parts per set.

    We can use AVG() to get the average of the column num_parts

    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT AVG(num_parts) AS average_parts
    FROM sets
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Data science notebooks & charts

    Visualizations are very helpful to summarize data and gain insights. A well-crafted chart often conveys information much better than a table.

    It is very straightforward to include charts in a data science notebook. For example, let's take a look at the number of sets released per year.

    We will first get the information using COUNT of the column set_num and grouping by year with 'GROUP BY':

    Unknown integration
    DataFrameavailable as
    sets_year
    variable
    SELECT year, 
           COUNT(set_num) AS sets_per_year
      FROM sets
     GROUP BY year
     ORDER BY year ASC
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Workspace has built in chart cells (create one by clicking on Add Chart). We use one to build the chart using the sets_year table we created in the cell above.

    Current Type: Bar
    Current X-axis: year
    Current Y-axis: sets_per_year
    Current Color: None

    Lego sets published by year

    Understanding Lego sets popularity

    Now let's now move on to the competition and challenge.

    📖 Background

    You recently applied to work as a data analyst intern at the famous Lego Group in Denmark. As part of the job interview process, you received the following take-home assignment:

    You are asked to use the provided dataset to understand the popularity of different Lego sets and themes. The idea is to become familiarized with the data to be ready for an interview with a business stakeholder.

    💾 The data

    You received access to a database with the following tables. You can also see above a visualization of how the tables are related to each other. (source):
    inventory_parts
    • "inventory_id" - id of the inventory the part is in (as in the inventories table)
    • "part_num" - unique id for the part (as in the parts table)
    • "color_id" - id of the color
    • "quantity" - the number of copies of the part included in the set
    • "is_spare" - whether or not it is a spare part
    parts
    • "part_num" - unique id for the part (as in the inventory_parts table)
    • "name" - name of the part
    • "part_cat_id" - part category id (as in part_catagories table)
    part_categories
    • "id" - part category id (as in parts table)
    • "name" - name of the category the part belongs to
    colors
    • "id" - id of the color (as in inventory_parts table)
    • "name" - color name
    • "rgb" - rgb code of the color
    • "is_trans" - whether or not the part is transparent/translucent
    inventories
    • "id" - id of the inventory the part is in (as in the inventory_sets and inventory_parts tables)
    • "version" - version number
    • "set_num" - set number (as in sets table)
    inventory_sets
    • "inventory_id" - id of the inventory the part is in (as in the inventories table)
    • "set_num" - set number (as in sets table)
    • "quantity" - the quantity of sets included
    sets
    • "set_num" - unique set id (as in inventory_sets and inventories tables)
    • "name" - the name of the set
    • "year" - the year the set was published
    • "theme_id" - the id of the theme the set belongs to (as in themes table)
    • num-parts - the number of parts in the set
    themes
    • "id" - the id of the theme (as in the sets table)
    • "name" - the name of the theme
    • "parent_id" - the id of the larger theme, if there is one

    Acknowledgments: Rebrickable.com

    💪 Challenge

    Create a report to summarize your findings. Include:

    1. What is the average number of Lego sets released per year?
    2. What is the average number of Lego parts per year?
    3. Create a visualization for item 2.
    4. What are the 5 most popular colors used in Lego parts?
    5. [Optional] What proportion of Lego parts are transparent?
    6. [Optional] What are the 5 rarest lego bricks?
    7. Summarize your findings.
    Unknown integration
    DataFrameavailable as
    df
    variable
    select year,count(set_num)
    from sets
    group by year
    order by year;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT year, avg(num_parts) as avg_num_part
    from sets
    group by year
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    💡 Learn more

    The following DataCamp courses can help review the skills needed for this challenge: