Workspace
Sean Stayte/

Exploratory Data Analysis in SQL for Absolute Beginners

0
Beta
Spinner

Exploratory Data Analysis in SQL for Absolute Beginners

We'll be working with data from the Climate change adaptation innovation in the water sector in Africa paper which can be found here.

This study looked at the response of technology to water vulnerability created by climate change in Africa.

The data used for adaptation technology was water-related patent data. The water stress index accounts for things like projected change of annual runoff, projected change of annual groundwater recharge, fresh water withdrawal rate, water dependency ratio, dam capacity, and access to reliable drinking water. A higher index indicates higher vulnerability.

The other variables are used to define the country's size (GDP), institutional effectiveness, research and development activity, and knowledge base.

The fields included in this dataset are:

  • year (data has been pooled for the following years: 1990, 2000, 2005, and 2010 to 2016)
  • adaptation technologies
  • openness to trade (trade as percentage of gross domestic product)
  • time required to register property (calendar days)
  • gross domestic product per capita
  • employers (total)
  • gross enrolment ratio
  • water stress index

Note that we have shortened the field names in our dataset for easier coding!

To consult the solution, head over to the file browser and select notebook-solution.ipynb.

Query the table

  1. Query the full table
Unknown integration
DataFrameavailable as
df
variable
SELECT *
FROM climate
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
  1. Query the country and water_stress_index fields and order by descending order of the water_stress_index field
Unknown integration
DataFrameavailable as
df
variable
SELECT country, water_stress_index
FROM public.climate
ORDER BY public.climate.water_stress_index DESC;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
  1. Query the country, year, and gdp_per_capita field to get a list of the country names and their respective GDP; order by the GDP in ascending order but only view the top 10 values
Unknown integration
DataFrameavailable as
df
variable
SELECT country, year, gdp_per_capita
FROM public.climate
ORDER BY public.climate.gdp_per_capita
LIMIT 10;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Filter the data

  1. Filter the data to see the country and year where the water_stress_index was between 0.5 and 0.6
Unknown integration
DataFrameavailable as
df
variable
SELECT country, year, public.climate.water_stress_index
FROM public.climate
WHERE public.climate.water_stress_index BETWEEN 0.5 AND 0.6;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
  1. This time, filter the data to see the countries that start with the letter E or S and have a water_stress_index above 0.5
Unknown integration
DataFrameavailable as
df
variable
SELECT country, year, public.climate.water_stress_index
FROM public.climate
WHERE public.climate.water_stress_index > 0.5 AND (country LIKE 'E%' OR country LIKE 'S%');
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Aggregate, group, and sort the data




  • AI Chat
  • Code