Beta
💪 Challenge
Create a report to answer the principal's questions. Include:
- What are the top 5 countries with the highest internet use (by population share)?
- How many people had internet access in those countries in 2019?
- What are the top 5 countries with the highest internet use for each of the following regions: 'Africa Eastern and Southern', 'Africa Western and Central', 'Latin America & Caribbean', 'East Asia & Pacific', 'South Asia', 'North America', 'European Union'?
- Create a visualization for those five regions' internet usage over time.
- What are the 5 countries with the most internet users?
- What is the correlation between internet usage (population share) and broadband subscriptions for 2019?
- Summarize your findings.
🧑⚖️ Judging criteria
CATEGORY | WEIGHTING | DETAILS |
---|---|---|
Response quality | 85% |
|
Presentation | 15% |
|
In the event of a tie, earlier submission time will be used as a tie-breaker.
📘 Rules
To be eligible to win, you must:
- Submit your response to this problem before the deadline.
All responses must be submitted in English.
Entrants must be:
- 18+ years old.
- Allowed to take part in a skill-based competition from their country.
Entrants can not:
- Be in a country currently sanctioned by the U.S. government.
XP will be awarded at the end of the competition. Therefore competition XP will not count towards any daily prizes.
✅ Checklist before submitting your workspace
- Rename your workspace to make it descriptive of your work. N.B., you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
- Check that all the cells run without error.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import plotly.express as px
people = pd.read_csv('data/people.csv')
broadband = pd.read_csv('data/broadband.csv')
internet = pd.read_csv('data/internet.csv')
1. top 5 countries with the highest internet use (by population share)
[24]
top_five = internet.query('Year == 2019')
top_five = top_five.groupby('Entity')['Internet_Usage'].sum()
top_five = top_five.sort_values(ascending=False).head(5)
top_five
The top 5 countries with the highest internet use in 2019 (by population share) are: Bahrain, Qatar, Kuwait, United Arab Emirates, Denmark
2. How many people had internet access in those countries in 2019
[25]
countries_users_2019 = people[people['Entity'].isin(top_five.index)]
countries_users_2019 = countries_users_2019 [(countries_users_2019 ['Year'] == 2019)].sort_values(ascending=False, by ='Users' )
countries_users_2019 = countries_users_2019[['Year', 'Entity', 'Users' ]]
countries_users_2019
People with internet access in 2019 in:
United Arab Emirates 9 133 361,
Denmark 5 682 653,
Kuwait 4 420 795,
Qatar 2 797 495,
Bahrain 1 489 735
3. top 5 countries with the highest internet use for each of the regions
new = internet.dropna(axis=0, subset=('Code', ))
new = new.query ('Entity != "World"')
df1 = pd.read_html('https://statisticstimes.com/geography/countries-by-continents.php', thousands=None, decimal=',')
may = df1[2]
may.rename (columns= {'ISO-alpha3 Code':'Code'}, inplace = True )
countries = new.merge(may, how = "left")
Africa Eastern and Southern
[27]
aes = 'Eastern Africa', 'Southern Africa'
aes1 = countries[countries['Region 1'].isin(aes)]
aes1 = aes1.query('Year == 2017')
top_five_aes = aes1.groupby('Entity')['Internet_Usage'].sum().sort_values(ascending=False).head(5)
top_five_aes