How Likely is an Athlete to Win a Medal in Judo?
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    How Likely is an Athlete to Win a Medal in Judo?

    Welcome to your workspace! In this walkthrough, you will learn the basics of Workspace as you begin to analyze an Olympics dataset and prepare a shareable Workspace report!

    Your goal will be to use the data available to you to help estimate how likely an athlete is to win a medal in Judo.

    Keep an eye out for 💪  icons throughout the notebook. These will indicate opportunities for you to try out Workspace for yourself!

    🏆  Load in the Olympics Data

    If you click on the file browser icon, you can see that you have access to olympics_data.csv, a file with different information on Olympics events. The cell below uses pandas to import the data and preview it.

    Go ahead and try to run the cell now to import and inspect the data!

    To run a cell, click inside it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell and automatically switch to the next cell.

    # Import libraries
    import pandas as pd
    import numpy as np
    
    # # Import the data
    olympics_data = pd.read_csv("olympics_data.csv")
    
    # Preview the DataFrame
    olympics_data

    💪  Browse through the interactive table to see if you can already learn anything from the data!

    🌐  Bring in additional data

    Depending on your analysis, you may find using additional world data helpful. In Workspace, you can add and run SQL cells to access relational databases. The query below retrieves data from a database containing information on nations of the world. It stores the query results as a pandas DataFrame with a name of your choosing.

    You can click the "Browse tables" button to view the available tables and columns or continue with the existing query.

    Unknown integration
    DataFrameavailable as
    countries_data
    variable
    -- Select country data based on most recent year
    WITH world_countries AS (
        SELECT 
            countries.name AS country, 
            continents.name AS continent,  
    		year,
        	population,
        	gdp,
    		ROW_NUMBER() OVER(PARTITION BY country ORDER BY year DESC) AS year_index
    	FROM countries
        INNER JOIN country_stats USING(country_id)
        INNER JOIN regions USING(region_id)
        INNER JOIN continents USING(continent_id)
        GROUP BY 1, 2, 3, 4, 5)
    
    SELECT
    	country,
        continent,
        year,
        population,
        gdp
    FROM world_countries
    
    WHERE year_index = 1
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    🔬  Zooming in on judo

    We now have country data that we can combine with the Olympics data. We will use the .merge() method to combine the two DataFrames using the team and country columns.

    A "left" join matches on rows in the olympics_data DataFrame, as some teams will not be present in the countries_data DataFrame. We then filter for rows where the sport is "Judo".

    # Perform a left join between the two DataFrames using the country columns
    olympics_country_data = olympics_data.merge(
        countries_data, left_on="team", right_on="country", how="left", suffixes=["_games", "_country_data"]
    )
    
    # Filter for Judo
    judo_data = olympics_country_data[olympics_country_data["sport"] == "Judo"]
    
    # Preview the data
    judo_data

    🎨  Visualize the data

    An essential skill in exploratory analysis is data visualization. Let's visualize the top countries by medal count. To do so, we will drop rows where the medal is a null value and group the DataFrame by team, and count the number of medals.

    # Drop athletes who didn't win a medal
    judo_medals = judo_data.dropna(subset="medal")
    
    # Get the counts of medals by team
    country_counts = judo_medals.groupby("team", as_index=False)["medal"].count()
    
    # Select only the top 10 countries
    best_countries = country_counts.sort_values(by="medal", ascending=False).head(10)
    
    # Inspect the data
    best_countries

    As we might expect, Japan has the highest medal count for judo! But this would look even better as a plot. Fortunately, Workspace has a handy chart cell that allows you to quickly generate and customize different chart types.

    Let's use a bar chart showing each country's total number of medals won. Select the cell below and click "Refresh" to generate the chart!

    💪  Be sure to try out other data visualizations by adjusting the chart type, the x-axis, y-axis, and grouping options!

    Current Type: Bar
    Current X-axis: medal
    Current Y-axis: team
    Current Color: None

    Top judo countries by medal count

    🔬  Go forth and analyze!

    Well done! It's now up to you to further explore the data you imported, create new features, and estimate the probability that a judo athlete will earn a medal. Try to use the additional SQL data you queried!

    When you're finished, make sure to publish your work which can be shared with peers and featured on your DataCamp profile. After you have finished preparing your report, consider the following options:

    • Try out our ready-to-use datasets. These cover various topics and include flat files such as csvs and additional databases for you to test your SQL skills!
    • Kickstart your next project by using one of our templates. These provide the code and instructions on various data science topics, ranging from machine learning to visualization.
    • Want to go at it on your own? Open a blank workspace and get coding!