Simone Bugna/

City of Rome Weather Analysis & Prediction


City of Rome Weather Analysis & Prediction

In this workbook I will analyze how Rome climate changed across the last four decades. Particularly, I will focus about temperature related aspects: annual average temperature and number of fog days. At the end, we will see how Rome climate could change in the near future.

Unfortunately, rain precipitation data are not reliable, so I will not use them in the analysis.

1. Configuration, Data Mining & Data Cleaning

In this section we will collect and clean data.


Set variables for city, start year / month and end year / month.

city = 'Roma'

start_year = 1980
start_month = 1

end_year = 2020
end_month = 12

Config #2

Import libraries and data structures.

# Import libraries
import requests, csv
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

# Set seaborn parameters
sns.set(rc = {'figure.figsize': (8, 4)}, font = 'calibri')
sns.set_style('whitegrid', {'grid.linestyle': ':', 'axes.spines.right': False, '': False})

# Import data structures
month_list = ['Gennaio', 'Febbraio', 'Marzo', 'Aprile', 'Maggio', 'Giugno', 'Luglio', 'Agosto', 'Settembre', 'Ottobre', 'Novembre', 'Dicembre']

header = ['city', 'date', 't_avg_c', 't_min_c', 't_max_c', 'dew_point_c', 'humidity_%', 'visibility_km', 'wind_avg_kmh', 'wind_max_kmh', 'gust_kmh', 'air_pressure_asl_mb', 'air_pressure_avg_mb', 'rain_mm', 'phenomena']

convert_dict = {'t_avg_c': float, 't_min_c': float, 't_max_c': float, 'dew_point_c': float, 'humidity_%': int, 'visibility_km': int, 'wind_avg_kmh': int, 'wind_max_kmh': int, 'gust_kmh': int, 'air_pressure_asl_mb': int, 'air_pressure_avg_mb': int, 'rain_mm': float}

Data Scraping

Download weather data from and store them into weather_list list (it takes a while).

weather_list = []

for year in range(start_year, end_year + 1):
    for month in range(start_month - 1, end_month):
        CSV_URL = '' + city + '/' + str(year) + '/' + month_list[month] + '?format=csv'

        with requests.Session() as s:
            download = s.get(CSV_URL)  # Set the connection

            decoded_content = download.content.decode('utf-8')  # Decode csv content

            records = csv.reader(decoded_content.splitlines(), delimiter = ';')  # Read csv content
            weather_list += list(records)[1::]  # Convert "csv.reader" object in list and glue it into "weather_list" with no header

Data Cleaning

Glue weather data into weather_df DataFrame and clean them.

weather_df = pd.DataFrame(weather_list, columns = header)

# Replace empty cells with NaN, NaN phenomena with zeros, commas with dots
weather_df = weather_df.replace(r'^\s*$', np.nan, regex = True)
weather_df['phenomena'] = weather_df['phenomena'].fillna('none')
weather_df = weather_df.replace(',', '.', regex = True)

# Drop NaN
weather_df.dropna(inplace = True)

# Convert with correct data types
weather_df['date'] = pd.to_datetime(weather_df['date'], dayfirst = True)
weather_df = weather_df.astype(convert_dict)


Data Cleaning #2

Clean data on 'phenomena' column.

weather_df.loc[weather_df['phenomena'].str.contains('temporale|grandine'), 'phenomena'] = 'storm'
weather_df.loc[weather_df['phenomena'].str.contains('pioggia'), 'phenomena'] = 'rain'
weather_df.loc[weather_df['phenomena'].str.contains('neve'), 'phenomena'] = 'snow'
weather_df.loc[weather_df['phenomena'].str.contains('nebbia'), 'phenomena'] = 'fog'


2. Exploratory Analysis

In this section we will analyze correlation between variables.


Group weather data by year.

  • AI Chat
  • Code