Emanuel Raptis














Sign up
Beta
Spinner

Example of downloading data from Statistics Sweden

By: Emanuel Raptis

This workbook gives a simple example of how to download data using the SCB (Statistics Sweden) API. For this example I will use the monthly population statistic for Stockholm County in Sweden.

Normally, I would use the "pyscbwrapper" module for this but it is not available from within DataCamp workspace. In this example I will use another solution by retrieving the API url and JSON query directly from the SCB datbase.

Here you can read about the SCB API: https://scb.se/vara-tjanster/oppna-data/api-for-statistikdatabasen/
Here you can read the Pyscbwrappoer documentation: https://github.com/kirajcg/pyscbwrapper/blob/master/pyscbwrapper.ipynb

# Import modules
import requests
import json
import pandas as pd
# Download data from SCB API
session = requests.Session()

# The query question can be retrieved from the SCB database
query = {
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "vs:RegionLän07EjAggr",
        "values": [
          "01"
        ]
      }
    },
    {
      "code": "Forandringar",
      "selection": {
        "filter": "item",
        "values": [
          "100"
        ]
      }
    },
    {
      "code": "Kon",
      "selection": {
        "filter": "item",
        "values": [
          "1+2"
        ]
      }
    },
  ],
  "response": {
    "format": "json"
  }
}

url = "https://api.scb.se/OV0104/v1/doris/sv/ssd/START/BE/BE0101/BE0101G/ManadBefStatRegion"

response = session.post(url, json=query)
response_json = json.loads(response.content.decode('utf-8-sig'))

# To get and view the data
response_data = response_json['data']

# Inspect the JSON list of dictionaries. Define the number of rows to print (instead of printing the entire dataset)
num_rows = 3

# Iterate over the desired number of rows and print the dictionary data
for row in response_data[:num_rows]:
    print(row)

Convert to DataFrame

The data comes as a list of dictionaries. This needs to be converted into a DataFrame.

# Convert to a tuple
scb_dict = {tuple(d['key']): d['values'][0] for d in response_data}

# Create a DataFrame
df = pd.DataFrame.from_dict(scb_dict, orient='index', columns=['value'])

# Name index
df.index.set_names('keys', inplace=True)

# Reset index and rename columns
df.reset_index(inplace=True)
df[['region', 'förändringar', 'kön', 'månad']] = df['keys'].apply(pd.Series)
df.drop('keys', axis=1, inplace=True)
df.rename(columns={'value':'befolkning'}, inplace=True)

# View the DataFrame
print(df.head())

Transform and clean Dataframe

The DataFrame now contains the total monthly population ("befolkning") of Stockholm County (region = 01). Since I did not specify the time period in the query question, all available months in the database are downloaded.

Let's get rid of some variables and transform the month ("månad") column to datetime.

# Inspect datatypes
print(df.dtypes)

# Transform variables and inspect again
df['befolkning'] = df['befolkning'].astype(int)
df['månad'] = df['månad'].str.replace("M", "-")
df['månad'] = df['månad']+'-01'
df['månad'] = pd.to_datetime(df['månad'])

print(df.dtypes)

# Subset columns
df = df[['region', 'månad', 'befolkning']]

print(df.head())

Visualize

Now it's time to use the newly aquired data and make a nice plot! Here I have filtered the data so to only plot the population development from January 2019 to the most recent month.

Now - you're ready to keep track of the montly population development just by running this script!

# Import modules to format x and y-axis
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.dates as mdates

#Filter based on date
df = df[df['månad'] > '2018-12']

# Write a function to format y-axis with thousand separator
def y_format(x, pos):
    return '{:,.0f}'.format(x).replace(',', ' ')

# Visualize
sns.lineplot(x='månad', y='befolkning', data=df)
plt.xticks(rotation=45, ha='right')
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(y_format))
plt.title('Monthly population, Stockholm County')
plt.xlabel('')
plt.ylabel('Population')
plt.show()
  • AI Chat
  • Code