Project: Predicting Temperature in London
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    As the climate changes, predicting the weather becomes ever more important for businesses. Since the weather depends on a lot of different factors, you will want to run a lot of experiments to determine what the best approach is to predict the weather. In this project, you will run experiments for different regression models predicting the mean temperature, using a combination of sklearn and MLflow.

    You will be working with data stored in london_weather.csv, which contains the following columns:

    • date - recorded date of measurement - (int)
    • cloud_cover - cloud cover measurement in oktas - (float)
    • sunshine - sunshine measurement in hours (hrs) - (float)
    • global_radiation - irradiance measurement in Watt per square meter (W/m2) - (float)
    • max_temp - maximum temperature recorded in degrees Celsius (°C) - (float)
    • mean_temp - mean temperature in degrees Celsius (°C) - (float)
    • min_temp - minimum temperature recorded in degrees Celsius (°C) - (float)
    • precipitation - precipitation measurement in millimeters (mm) - (float)
    • pressure - pressure measurement in Pascals (Pa) - (float)
    • snow_depth - snow depth measurement in centimeters (cm) - (float)
    import pandas as pd
    import numpy as np
    import mlflow
    import mlflow.sklearn
    import seaborn as sns
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import mean_squared_error
    from sklearn.impute import SimpleImputer
    from sklearn.preprocessing import StandardScaler
    from sklearn.linear_model import LinearRegression
    from sklearn.tree import DecisionTreeRegressor
    from sklearn.ensemble import RandomForestRegressor
    
    # Load data and perform exploratory analysis
    df = pd.read_csv('london_weather.csv', parse_dates=['date'])
    df.head()
    df['year'] = df['date'].dt.year
    df['month'] = df['date'].dt.month
    df.head()
    df.shape
    df.dtypes
    df.info()
    df.describe()
    df[df.isnull().any(axis=1)]
    df_year_temp_mean = df.groupby('year').agg({'max_temp':'mean', 'min_temp':'mean'})
    df_year_temp_mean
    sns.lineplot(data=df_year_temp_mean, x='year', y='min_temp')
    sns.lineplot(data=df_year_temp_mean, x='year', y='max_temp')