Project: Predicting Temperature in London
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    As the climate changes, predicting the weather becomes ever more important for businesses. Since the weather depends on a lot of different factors, you will want to run a lot of experiments to determine what the best approach is to predict the weather. In this project, you will run experiments for different regression models predicting the mean temperature, using a combination of sklearn and MLflow.

    You will be working with data stored in london_weather.csv, which contains the following columns:

    • date - recorded date of measurement - (int)
    • cloud_cover - cloud cover measurement in oktas - (float)
    • sunshine - sunshine measurement in hours (hrs) - (float)
    • global_radiation - irradiance measurement in Watt per square meter (W/m2) - (float)
    • max_temp - maximum temperature recorded in degrees Celsius (°C) - (float)
    • mean_temp - mean temperature in degrees Celsius (°C) - (float)
    • min_temp - minimum temperature recorded in degrees Celsius (°C) - (float)
    • precipitation - precipitation measurement in millimeters (mm) - (float)
    • pressure - pressure measurement in Pascals (Pa) - (float)
    • snow_depth - snow depth measurement in centimeters (cm) - (float)
    import pandas as pd
    import numpy as np
    import mlflow
    import mlflow.sklearn
    import seaborn as sns
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import mean_squared_error
    from sklearn.impute import SimpleImputer
    from sklearn.preprocessing import StandardScaler
    from sklearn.linear_model import LinearRegression
    from sklearn.tree import DecisionTreeRegressor
    from sklearn.ensemble import RandomForestRegressor
    
    # Load data and perform exploratory analysis
    
    
    def preprocess_df(df, feature_selection, target_var):
        """
        Split dataframe into X and y, and train and test consecutively. Then impute and scale both train and test features. Returns the train and test ets
        """
        # Complete this function
        
        return X_train, X_test, y_train, y_test
    
    feature_selection = []
    target_var = ''
    
    X_train, X_test, y_train, y_test = preprocess_df(df, feature_selection, target_var)
    
    def predict_and_evaluate(model, x_test, y_test):
        """
        Predict values from test set, calculate and return the root mean squared error.
        """
        # Complete this function
        
        return rmse
    
    EXPERIMENT_NAME = ""
    EXPERIMENT_ID = mlflow.create_experiment(EXPERIMENT_NAME)
    
    # Adjust the parameters
    max_depth_parameters = [1, 2]
    
    for idx, depth in enumerate([1, 2, 5, 10, 20]):
        parameters = {
            'max_depth': depth,
            'random_state': 1
        }    
        RUN_NAME = f"run_{idx}"
        # Complete the experiment loop
    
    
    experiment_results = mlflow.search_runs(experiment_names=[EXPERIMENT_NAME])
    experiment_results