Exploring Astronaut Activities in SQL - Webinar
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Exploring Astronaut Activities in SQL

    Welcome to your webinar workspace! You can follow along as we analyze the data in a SQL database and visualize the results.

    To set up your integration, create a PostgreSQL integration with the following credentials:

    • Integration Name: Astronaut Codealong
    • Hostname: workspacedemodb.datacamp.com
    • Database: astronauts
    • Username: astronauts
    • Password: astronauts

    Source of the data

    Exploring our data

    Let's start by looking at the table we will be working with.

    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT 
    * 
    FROM evas
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Let's inspect the purpose column in greater detail.

    Unknown integration
    DataFrameavailable as
    df
    variable
     SELECT
        purpose,
        CASE WHEN purpose ILIKE '%photo%' THEN 1 ELSE 0 END AS photography,
        CASE WHEN purpose ILIKE '%collect%' THEN 1 ELSE 0 END AS collection,        
        CASE WHEN purpose ILIKE '%construct%' OR purpose ILIKE '%install%' OR purpose ILIKE '%assembl%' THEN 1 ELSE 0 END AS installation,
        CASE WHEN purpose ILIKE '%recover%' OR purpose ILIKE '%retrieve%' THEN 1 ELSE 0 END AS retrieval,
        CASE WHEN purpose ILIKE '%jettison%' THEN 1 ELSE 0 END AS jettison,
        CASE WHEN purpose ILIKE '%replace%' OR purpose ILIKE '%fix%' OR purpose ILIKE '%repair%' OR purpose ILIKE '%servic%' THEN 1 ELSE 0 END AS repair
    FROM evas
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    What are the most common types of EVAs?

    Let's start to get a rough idea of the most popular types of EVAs astronauts take by using CASE expressions.

    Unknown integration
    DataFrameavailable as
    df
    variable
    WITH purposes AS ( 
     	SELECT
            purpose,
            CASE WHEN purpose ILIKE '%photo%' THEN 1 ELSE 0 END AS photography,
            CASE WHEN purpose ILIKE '%collect%' THEN 1 ELSE 0 END AS collection,
            CASE WHEN purpose ILIKE '%construct%' OR purpose ILIKE '%install%' OR purpose ILIKE '%assembl%' THEN 1 ELSE 0 END AS installation,
            CASE WHEN purpose ILIKE '%recover%' OR purpose ILIKE '%retrieve%' THEN 1 ELSE 0 END AS retrieval,
            CASE WHEN purpose ILIKE '%jettison%' THEN 1 ELSE 0 END AS jettison,
            CASE WHEN purpose ILIKE '%replace%' OR purpose ILIKE '%fix%' OR purpose ILIKE '%repair%' OR purpose ILIKE '%servic%' THEN 1 ELSE 0 END AS repair
        FROM evas
    )
    
    SELECT 
    	SUM(photography) AS count,
        'photography' AS type
    FROM purposes
    UNION
    SELECT 
    	SUM(collection) AS count,
        'collection' AS type
    FROM purposes
    UNION
    SELECT 
    	SUM(retrieval) AS count,
        'retrieval' AS type
    FROM purposes
    UNION
    SELECT 
    	SUM(jettison) AS count,
        'jettison' AS type
    FROM purposes
    UNION
    SELECT 
    	SUM(repair) AS count,
        'repair' AS type
    FROM purposes
    UNION
    SELECT 
    	SUM(installation) AS count,
        'installation' AS type
    FROM purposes
    ORDER BY count
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    We are now ready to build this into a final query!

    Unknown integration
    DataFrameavailable as
    df
    variable
    WITH purposes AS ( 
     	SELECT
            purpose,
            CASE WHEN purpose ILIKE '%photo%' THEN 1 ELSE 0 END AS photography,
            CASE WHEN purpose ILIKE '%collect%' THEN 1 ELSE 0 END AS collection,
            CASE WHEN purpose ILIKE '%construct%' OR purpose ILIKE '%install%' OR purpose ILIKE '%assembl%' THEN 1 ELSE 0 END AS installation,
            CASE WHEN purpose ILIKE '%recover%' OR purpose ILIKE '%retrieve%' THEN 1 ELSE 0 END AS retrieval,
            CASE WHEN purpose ILIKE '%jettison%' THEN 1 ELSE 0 END AS jettison,
            CASE WHEN purpose ILIKE '%replace%' OR purpose ILIKE '%fix%' OR purpose ILIKE '%repair%' OR purpose ILIKE '%servic%' THEN 1 ELSE 0 END AS repair
        FROM evas
    )
    
    SELECT 
    	SUM(photography) AS count,
        'photography' AS type
    FROM purposes
    UNION
    SELECT 
    	SUM(collection) AS count,
        'collection' AS type
    FROM purposes
    UNION
    SELECT 
    	SUM(retrieval) AS count,
        'retrieval' AS type
    FROM purposes
    UNION
    SELECT 
    	SUM(jettison) AS count,
        'jettison' AS type
    FROM purposes
    UNION
    SELECT 
    	SUM(repair) AS count,
        'repair' AS type
    FROM purposes
    UNION
    SELECT 
    	SUM(installation) AS count,
        'installation' AS type
    FROM purposes
    ORDER BY count
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Current Type: Bar
    Current X-axis: count
    Current Y-axis: type
    Current Color: None

    EVAs by purpose

    How much material has been extracted from EVAs?

    Skimming through the purpose column, we also saw numerous references to extracting rock/dust or geological material. In this case, it will be difficult to extract the total quantity across the columns. Regular expressions to the rescue!

    We will define a pattern to extract the total pounds extracted per EVA, and then sum them up. Let's first do a sense check of the data.

    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT
    	purpose
    FROM evas
    WHERE purpose ILIKE '%geologic%' OR purpose ILIKE '%rock%'
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Okay, we now know that the format of the pounds extracted is always number lbs of rock/geologic. We can construct a pattern to detect this and extract the number!

    To do so, we will make use of:

    • \d+ to match one or more digits.
    • \.? to match zero or more periods.\
    • * to match zero or more digits following the optional decimal place.
    • () to specify we only want the digits.
    • \s to match the whitespace (i.e., spaces).
    • [] and | to specify we either want to match "geologic" or "rock".

    Let's put this into action, using SUBSTRING() to extract our pattern!

    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT
    	purpose,
    	SUBSTRING(purpose, '(\d+\.?\d*)\slb\sof\s((rock)|(geologic))') AS weight
    FROM evas
    WHERE purpose ILIKE '%rock%' OR purpose ILIKE '%geologic%'
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Now we can use a CTE to calculate the total amount!