How Much of the World Has Access to the Internet?
# Load the tidiverse package
suppressPackageStartupMessages(library(tidyverse))
# Read the data
broadband <- read_csv('data/broadband.csv', show_col_types = FALSE)
# Take a look at the first rows
broadband
Data analysis example:
Find the number of broadband subscriptions (per 100 people) for the European Union in 2018.
We can use filter()
to filter for Entity
equal to 'European Union' and the Year
equal to 2018.
broadband %>% filter(Entity == 'European Union' & Year == 2018)
Data science notebooks & visualizations
Visualizations are very helpful in summarizing data and gaining insights. A well-crafted chart often conveys information much better than a table.
It is very straightforward to include plots in a data science notebook. For example, let's look at how broadband subscriptions have changed in time in Latin America and the Caribbean.
First, we filter our data for 'Latin America and Caribbean' and save that to a new data frame called latam
:
latam <- broadband %>% filter(Entity == 'Latin America and Caribbean')
latam
Workspace has built-in chart cells (create one by clicking on Add Chart). We use one to build the chart using the latam
table we created in the cell above.
Broadband subscriptions in Latin America
You can also use other visualization libraries like ggplot2.
How Much of the World Has Access to the Internet?
Now let's now move on to the competition and challenge.
📖 Background
You work for a policy consulting firm. One of the firm's principals is preparing to give a presentation on the state of internet access in the world. She needs your help answering some questions about internet accessibility across the world.
💾 The data
The research team compiled the following tables (source):
internet
- "Entity" - The name of the country, region, or group.
- "Code" - Unique id for the country (null for other entities).
- "Year" - Year from 1990 to 2019.
- "Internet_usage" - The share of the entity's population who have used the internet in the last three months.
people
- "Entity" - The name of the country, region, or group.
- "Code" - Unique id for the country (null for other entities).
- "Year" - Year from 1990 to 2020.
- "Users" - The number of people who have used the internet in the last three months for that country, region, or group.
broadband
- "Entity" - The name of the country, region, or group.
- "Code" - Unique id for the country (null for other entities).
- "Year" - Year from 1998 to 2020.
- "Broadband_Subscriptions" - The number of fixed subscriptions to high-speed internet at downstream speeds >= 256 kbit/s for that country, region, or group.
Acknowledgments: Max Roser, Hannah Ritchie, and Esteban Ortiz-Ospina (2015) - "Internet." OurWorldInData.org.
# Read the internet table
internet <- read_csv('data/internet.csv', show_col_types = FALSE)
# Take a look at the first rows
internet
# Read the people table
people <- read_csv('data/people.csv', show_col_types = FALSE)
people
💪 Challenge
Create a report to answer the principal's questions. Include:
- What are the top 5 countries with the highest internet use (by population share)?
- How many people had internet access in those countries in 2019?
- What are the top 5 countries with the highest internet use for each of the following regions: 'Middle East & North Africa', 'Latin America & Caribbean', 'East Asia & Pacific', 'South Asia', 'North America', 'Europe & Central Asia'?
- Create a visualization for those five regions' internet usage over time.
- What are the 5 countries with the most internet users?
- What is the correlation between internet usage (population share) and broadband subscriptions for 2019?
- Summarize your findings.
Note: This is how the World Bank defines the different regions.
🧑⚖️ Judging criteria
CATEGORY | WEIGHTING | DETAILS |
---|---|---|
Response quality | 85% |
|
Presentation | 15% |
|
In the event of a tie, earlier submission time will be used as a tie-breaker.