Workspace
Moaz Hamdy/

Competition - Everyone Can Learn Python Scholarship

0
Beta
Spinner

Everyone Can Learn Python Scholarship

📖 Background

The first "Everyone Can Learn Python" Scholarship from DataCamp is now open for entries.

The challenges below test the Python and SQL skills you gained from Introduction to Python and Introduction to SQL and pair them with your existing problem-solving and creative thinking.

The scholarship is open to people who have completed or are completing their secondary education and are preparing to pursue a degree in computer science or data science. Students preparing for graduate-level computer science or data science degrees are also welcome to apply.

💡 Learn more

The following DataCamp courses can help review the skills needed for this challenge:

  • Introduction to Python
  • Introduction to SQL

ℹ️ Introduction to Data Science Notebooks

You can skip this section if you are already familiar with data science notebooks.

Data science notebooks

A data science notebook is a document containing text cells (what you're reading now) and code cells. What is unique with a notebook is that it's interactive: You can change or add code cells and then run a cell by selecting it and then clicking the Run button to the right ( , or Run All on top) or hitting control + enter.

The result will be displayed directly in the notebook.

Try running the Python cell below:

# Run this cell to see the result (click on Run on the right, or control+enter)
100 * 1.75 * 20

Modify any of the numbers and rerun the cell.

You can add a Markdown, Python, or SQL cell by clicking on the Add Markdown, Add Code, and Add SQL buttons that appear as you move the mouse pointer near the bottom of any cell.

Here at DataCamp, we call our interactive notebook Workspace. You can find out more about Workspace here.

1️⃣ Python 🐍 - CO2 Emissions

Now let's now move on to the competition and challenge.

📖 Background

You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.

After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.

💾 The data I

You have access to seven years of CO2 emissions data for Canadian vehicles (source):

  • "Make" - The company that manufactures the vehicle.
  • "Model" - The vehicle's model.
  • "Vehicle Class" - Vehicle class by utility, capacity, and weight.
  • "Engine Size(L)" - The engine's displacement in liters.
  • "Cylinders" - The number of cylinders.
  • "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
  • "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
  • "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
  • "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.

The data comes from the Government of Canada's open data website.

# Import the pandas and numpy packages
import pandas as pd
import numpy as np

# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')

# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()

# Preview the dataframe
cars
# Look at the first ten items in the CO2 emissions array
cars_co2_emissions[:10]

💪 Challenge I

Help your colleague gain insights on the type of vehicles that have lower CO2 emissions. Include:

  1. What is the median engine size in liters?
  2. What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
  3. What is the correlation between fuel consumption and CO2 emissions?
  4. Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
  5. What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
  6. Any other insights you found during your analysis?

2️⃣ SQL - Understanding the bicycle market

📖 Background

You work for a chain of bicycle stores. Your new team leader comes from a different industry and wants your help learning about the bicycle market. Specifically, they need to understand better the brands and categories for sale at your stores.

💾 The data II

You have access to the following tables:

products
  • "product_id" - Product identifier.
  • "product_name" - The name of the bicycle.
  • "brand_id" - You can look up the brand's name in the "brands" table.
  • "category_id" - You can look up the category's name in the "categories" table.
  • "model_year" - The model year of the bicycle.
  • "list_price" - The price of the bicycle.
brands
  • "brand_id" - Matches the identifier in the "products" table.
  • "brand_name" - One of the nine brands the store sells.
categories
  • "category_id" - Matches the identifier in the "products" table.
  • "category_name" - One of the seven product categories in the store.

A note on SQL

You can click the "Browse tables" button in the upper right-hand corner of the SQL cell below to view the available tables. They will show on the left of the notebook.

It is also important to note that the database used in this challenge is a slightly different version (SQL Server) from the one used in the Introduction to SQL course (PostgreSQL). You might notice that the keyword LIMIT does not exist in SQL Server.

Unknown integration
DataFrameavailable as
df
variable
SELECT * 
FROM products;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
SELECT * FROM brands;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
SELECT * FROM categories
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

💪 Challenge II

Help your team leader understand your company's products. Include:

  1. What is the most expensive item your company sells? The least expensive?
  2. How many different products of each category does your company sell?
  3. What are the top three brands with the highest average list price? The top three categories?
  4. Any other insights you found during your analysis?

🧑‍⚖️ Judging criteria

 

Public Upvotes - The top 100 most upvoted entries will be judged according to the criteria below. Entries in position 101 or below will not proceed to the next stage. Only votes made by accounts registered before voting opens will count towards final decisions.

CATEGORYWEIGHTINGDETAILS
Response quality65%
  • Accuracy (20%) - The response must be representative of the original data and free from errors.
  • Clarity (20%) - The response must be easy to understand and clearly expressed.
  • Completeness (15%) - The response must be a full report that responds to the question posed.
  • Insights (10%) - The response must contain some insights based on the data using your own judgment and interpretation.
Storytelling20%
  • How well the response is connected to the original data.
  • How the narrative and whole response connects together.
  • The report contains sufficient depth but is also concise.
  • How the response flows from one point to the next.
Presentation15%
  • How legible/understandable the response is.
  • How well-formatted the response is.
  • Spelling and grammar.

In the event of a tie, user XP may be used as a tie-breaker.

📘 Rules

To apply for the scholarship, you must:

  • Submit your details via the scholarship application form.
  • Submit your response to this problem before the deadline.

All responses must be submitted in English.

We recommend that you complete the Introduction to Python and Introduction to SQL courses on our website, as many of the skills and requirements in this competition are covered within.

Entrants must be:

  • 18+ years old.
  • Enrolled in a secondary, tertiary, or graduate education program
  • Allowed to take part in a skill-based competition from their country.

Entrants can not:

  • Have earned or attained a post-secondary degree in computer science, data science, data analytics or a related field of study.
  • Be in a country currently sanctioned by the US government.

💡 Learn more

The following DataCamp courses can help review the skills needed for this challenge:

✅ Checklist before publishing and submitting to the competition

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the judging criteria, so the workbook is focused on your story.
  • Make sure the workspace reads well.
  • Pay attention to the judging criteria.
  • Check that all the cells run without error.

⌛️ Time is ticking. Good luck!