Skip to content
Statistics Sweden (SCB) API example
  • AI Chat
  • Code
  • Report
  • Spinner

    Example of downloading data from Statistics Sweden

    By: Emanuel Raptis

    This workbook gives a simple example of how to download data using the SCB (Statistics Sweden) API. For this example I will use the monthly population statistic for Stockholm County in Sweden.

    Normally, I would use the "pyscbwrapper" module for this but it is not available from within DataCamp workspace. In this example I will use another solution by retrieving the API url and JSON query directly from the SCB datbase.

    Here you can read about the SCB API: https://scb.se/vara-tjanster/oppna-data/api-for-statistikdatabasen/
    Here you can read the Pyscbwrappoer documentation: https://github.com/kirajcg/pyscbwrapper/blob/master/pyscbwrapper.ipynb

    # Import modules
    import requests
    import json
    import pandas as pd
    # Download data from SCB API
    session = requests.Session()
    
    # The query question can be retrieved from the SCB database
    query = {
      "query": [
        {
          "code": "Region",
          "selection": {
            "filter": "vs:RegionLän07EjAggr",
            "values": [
              "01"
            ]
          }
        },
        {
          "code": "Forandringar",
          "selection": {
            "filter": "item",
            "values": [
              "100"
            ]
          }
        },
        {
          "code": "Kon",
          "selection": {
            "filter": "item",
            "values": [
              "1+2"
            ]
          }
        },
      ],
      "response": {
        "format": "json"
      }
    }
    
    url = "https://api.scb.se/OV0104/v1/doris/sv/ssd/START/BE/BE0101/BE0101G/ManadBefStatRegion"
    
    response = session.post(url, json=query)
    response_json = json.loads(response.content.decode('utf-8-sig'))
    
    # To get and view the data
    response_data = response_json['data']
    
    # Inspect the JSON list of dictionaries. Define the number of rows to print (instead of printing the entire dataset)
    num_rows = 3
    
    # Iterate over the desired number of rows and print the dictionary data
    for row in response_data[:num_rows]:
        print(row)

    Convert to DataFrame

    The data comes as a list of dictionaries. This needs to be converted into a DataFrame.

    # Convert to a tuple
    scb_dict = {tuple(d['key']): d['values'][0] for d in response_data}
    
    # Create a DataFrame
    df = pd.DataFrame.from_dict(scb_dict, orient='index', columns=['value'])
    
    # Name index
    df.index.set_names('keys', inplace=True)
    
    # Reset index and rename columns
    df.reset_index(inplace=True)
    df[['region', 'förändringar', 'kön', 'månad']] = df['keys'].apply(pd.Series)
    df.drop('keys', axis=1, inplace=True)
    df.rename(columns={'value':'befolkning'}, inplace=True)
    
    # View the DataFrame
    print(df.head())

    Transform and clean Dataframe

    The DataFrame now contains the total monthly population ("befolkning") of Stockholm County (region = 01). Since I did not specify the time period in the query question, all available months in the database are downloaded.

    Let's get rid of some variables and transform the month ("månad") column to datetime.

    # Inspect datatypes
    print(df.dtypes)
    
    # Transform variables and inspect again
    df['befolkning'] = df['befolkning'].astype(int)
    df['månad'] = df['månad'].str.replace("M", "-")
    df['månad'] = df['månad']+'-01'
    df['månad'] = pd.to_datetime(df['månad'])
    
    print(df.dtypes)
    
    # Subset columns
    df = df[['region', 'månad', 'befolkning']]
    
    print(df.head())

    Visualize

    Now it's time to use the newly aquired data and make a nice plot! Here I have filtered the data so to only plot the population development from January 2019 to the most recent month.

    Now - you're ready to keep track of the montly population development just by running this script!

    # Import modules to format x and y-axis
    import seaborn as sns
    import matplotlib.pyplot as plt
    import matplotlib.ticker as ticker
    import matplotlib.dates as mdates
    
    #Filter based on date
    df = df[df['månad'] > '2018-12']
    
    # Write a function to format y-axis with thousand separator
    def y_format(x, pos):
        return '{:,.0f}'.format(x).replace(',', ' ')
    
    # Visualize
    sns.lineplot(x='månad', y='befolkning', data=df)
    plt.xticks(rotation=45, ha='right')
    plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
    plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(y_format))
    plt.title('Monthly population, Stockholm County')
    plt.xlabel('')
    plt.ylabel('Population')
    plt.show()