Workspace
Gwen Iroh/

How Likely is an Athlete to Win a Medal in Judo?

0
Beta
Spinner

How Likely is an Athlete to Win a Medal in Judo?

Welcome to your workspace! In this walkthrough, you will learn the basics of Workspace as you begin to analyze an Olympics dataset and prepare a shareable Workspace report!

Your goal will be to use the data available to you to help estimate how likely an athlete is to win a medal in Judo.

Keep an eye out for 💪  icons throughout the notebook. These will indicate opportunities for you to try out Workspace for yourself!

🏆  Load in the Olympics Data

If you click on the file browser icon, you can see that you have access to olympics_data.csv, a file with different information on Olympics events. The cell below uses pandas to import the data and preview it.

Go ahead and try to run the cell now to import and inspect the data!

To run a cell, click inside it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell and automatically switch to the next cell.

# Import libraries
import pandas as pd
import numpy as np

# # Import the data
olympics_data = pd.read_csv("olympics_data.csv")

# Preview the DataFrame
olympics_data

💪  Browse through the interactive table to see if you can already learn anything from the data!

🌐  Bring in additional data

Depending on your analysis, you may find using additional world data helpful. In Workspace, you can add and run SQL cells to access relational databases. The query below retrieves data from a database containing information on nations of the world. It stores the query results as a pandas DataFrame with a name of your choosing.

You can click the "Browse tables" button to view the available tables and columns or continue with the existing query.

Unknown integration
DataFrameavailable as
countries_data
variable
-- Select country data based on most recent year
WITH world_countries AS (
    SELECT 
        countries.name AS country, 
        continents.name AS continent,  
		year,
    	population,
    	gdp,
		ROW_NUMBER() OVER(PARTITION BY country ORDER BY year DESC) AS year_index
	FROM countries
    INNER JOIN country_stats USING(country_id)
    INNER JOIN regions USING(region_id)
    INNER JOIN continents USING(continent_id)
    GROUP BY 1, 2, 3, 4, 5)

SELECT
	country,
    continent,
    year,
    population,
    gdp
FROM world_countries

WHERE year_index = 1
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

🔬  Zooming in on judo

We now have country data that we can combine with the Olympics data. We will use the .merge() method to combine the two DataFrames using the team and country columns.

A "left" join matches on rows in the olympics_data DataFrame, as some teams will not be present in the countries_data DataFrame. We then filter for rows where the sport is "Judo".

# Perform a left join between the two DataFrames using the country columns
olympics_country_data = olympics_data.merge(
    countries_data, left_on="team", right_on="country", how="left", suffixes=["_games", "_country_data"]
)

# Filter for Judo
judo_data = olympics_country_data[olympics_country_data["sport"] == "Judo"]

# Preview the data
judo_data

🎨  Visualize the data

An essential skill in exploratory analysis is data visualization. Let's visualize the top countries by medal count. To do so, we will drop rows where the medal is a null value and group the DataFrame by team, and count the number of medals.

# Drop athletes who didn't win a medal
judo_medals = judo_data.dropna(subset="medal")

# Get the counts of medals by team
country_counts = judo_medals.groupby("team", as_index=False)["medal"].count()

# Select only the top 10 countries
best_countries = country_counts.sort_values(by="medal", ascending=False).head(10)

# Inspect the data
best_countries

As we might expect, Japan has the highest medal count for judo! But this would look even better as a plot. Fortunately, Workspace has a handy chart cell that allows you to quickly generate and customize different chart types.

Let's use a bar chart showing each country's total number of medals won. Select the cell below and click "Refresh" to generate the chart!

💪  Be sure to try out other data visualizations by adjusting the chart type, the x-axis, y-axis, and grouping options!

Current Type: Bar
Current X-axis: medal
Current Y-axis: team
Current Color: None

Top judo countries by medal count

🔬  Go forth and analyze!

Well done! It's now up to you to further explore the data you imported, create new features, and estimate the probability that a judo athlete will earn a medal. Try to use the additional SQL data you queried!

When you're finished, make sure to publish your work which can be shared with peers and featured on your DataCamp profile. After you have finished preparing your report, consider the following options:

  • Try out our ready-to-use datasets. These cover various topics and include flat files such as csvs and additional databases for you to test your SQL skills!
  • Kickstart your next project by using one of our templates. These provide the code and instructions on various data science topics, ranging from machine learning to visualization.
  • Want to go at it on your own? Open a blank workspace and get coding!
  • AI Chat
  • Code