Skip to content
Finalized_CODE_Project_Hamburg
  • AI Chat
  • Code
  • Report
  • Spinner

    The plan of our project was to connect air pollution observations from measurements with the LCZ (local climate zone) maps of the surrounding areas. The study area was Hamburg.

    The question to answer was if the air quality decreases with increasing temperature or weather the land use type and surrounding area has a higher impact on air pollution than temperature.

    The data we used were air pollution data from the European Environmental Agency, weather data from daswetter.com and a LCZ map from the standard files provided by WUDAPT.

    For the timeframe, we restricted our data to the last 20 years (2003 to 2023) and the most current LCZ map of Hamburg.

    Our code is organized in three main phases. In the first part we process the observation data and organize the station metadata. In the second phase, the code gets the LCZ mapping for all the station points. On the third part we merge the three dataframes to conduct our analyses.

    We expected to find a correlation between the pollutants and temperature, which could give a hint on future developments of air quality with climate change. Nevertheless, if this was not found we expected to get an idea on how the surrounding areas impact the air quality.

    Our results have shown that a medium to good correlation was found between O3 and the temperature. However, O3 observations were only provided at five stations. Yet, we might expect an increase of O3 in the future years due to climate change.

    NO2 did not show a correlation with temperature at most stations, but it was measured at almost all stations. Regarding the LCZ classification NO2 was higher in areas with industry or trees. In open areas it was lower.

    People working on this script: Marcelo Soeira and Lea Fink

    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    import datetime as dt
    import seaborn as sns 
    import numpy as np
    import rasterio 
    import rasterio.plot
    from rasterio.mask import mask
    import matplotlib as mpl
    import geopandas as gpd
    from fiona.crs import from_epsg
    from shapely.geometry import box
    from geopy.distance import lonlat, distance, geodesic
    import random #module to generate random numbers for testing the program

    read the both necessary files in the workspace, give separator (sep = 'whatever')

    ### load and store air temperature and air pollution contained in csv csv files as panda dataframe 
    ### stations -> air pollution observations
    ### weather  -> air temperature observations
    
    stations = pd.read_csv('hamburgstation.csv', sep=",", na_values=0) 
    weather = pd.read_csv('weatherdata_hamburg.csv', sep= ";", na_values=0)

    check what information are in the csv-Files and how the data can be used

    stations.info()
    weather.info()
    Hidden output

    Findings

    stations: Huge file with enourmous amount of data. To acomodate for the timeframe of our project, we will be restricting the number analyzed air pollutants and also the time span of the observations. On the examples to follow, we have selected only O3 and NO2 for photooxidants and PM2.5 for particles. The time span considered is 2002 onwards, as this matches the available temperature data.
    weather: just extracted mean temperature °C measured in the respective day. Necessary to download max/min temp?

    For our analyses, it will be useful to have the matadata of the air pollution observation points in a separed, smaller dataframe.

    Therefore, we created a third dataframe named stationMetaDataDf to store this information, based on values from the stations dataframe

    ### Brief preprocessing of station metadata
    ### Extract desired fields from stations dataframe and store them into stationMetaDataDf, eliminating any duplicate lines according to the code of the station
    
    stationsMetadata=stations[['code', 'site', 'site_type', 'longitude', 'latitude']].drop_duplicates(subset=['code'])
    
    #Uncomment the followint line to write it to csv to have it saved in my files
    #stationsMetadata.to_csv("stationsMetadata.csv", sep=';')
    ### Function created to process weather/airpollution observation stations into a standard dataframe useful for further processing
    
    def importStationData(sourceDf,agencyIdHeader, siteNameHeader, siteDescripHeader, lonHeader,latHeader):
        destinyDf=pd.DataFrame()
        
        sourceListHeader=["Source_ID",agencyIdHeader, siteNameHeader, siteDescripHeader, lonHeader,latHeader]
        sourceIndexDf=pd.DataFrame(sourceDf.index,columns=["Source_ID"])
        sourceStationCount=len(sourceDf)
    
        destinyListHeader=["Source_ID","Agency_ID","Site_Name","Site_Description", "Lon", "Lat"]
        destinyListTypes=[int(-99999),str(-99999),str(-99999),str(-99999),float(-99999),float(-99999)]
    
        dataMatrix={}
        for i in range(len(destinyListHeader)):
            dataColumn=[destinyListTypes[i]]*sourceStationCount
            dataMatrix[destinyListHeader[i]]=dataColumn
        destinyDf=pd.DataFrame(data=dataMatrix)
    
        for i in range(len(destinyDf)):
            for g in range(len(destinyListHeader)):
                if g == 0:
                    destinyDf.at[i,destinyListHeader[g]]=sourceIndexDf.iloc[i][sourceListHeader[g]]
                else:
                    destinyDf.at[i,destinyListHeader[g]]=sourceDf.iloc[i][sourceListHeader[g]]
                
        return destinyDf
    ### Main processing of station metadata
    stationsMetadata=importStationData(stationsMetadata,"code", "site", "site_type", "longitude","latitude")
    #print(stationsMetadata)
    Now that the station metadata is ready, we may go back to the preprocessing of the observation data itself: reformating, removal of undesired data to reduce processing requirements, removal of invalid data, etc...
    First we began by removing unnecessary information from stations dataframe and storing what was actually required in a new dataframe (stationSubset)
    We also included a new column (Mean_Temp) to receive data from the weather dataframe
    ### Here we extracted the required information from the station dataframe and stored it into a noew one called stationSubset. 
    ###We desired to have a new column in this dataframe to store the temperature falues, so we assigned it a list of -99999.0 to its end. This value was chosen to easily identify wrong values later on.
    
    #Data extraction into the new dataframe
    stationsSubset = stations[['date','code', 'no2', 'o3', 'pm2.5']]
    #Create a list with matching length to stationsSubset and constant value of -99999.0
    meanTemp=[float(-99999.0)]*len(stationsSubset)
    #Add this list as column
    stationsSubset=stationsSubset.assign(Mean_Temp=meanTemp)