Machine Learning - E-activity
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    In this e-tivity, you are asked to follow the ML process to:

    • preprocess a provided dataset, so it will become suitable for ML operations;
    • select an appropriate ML model, so it will cope with the specifics of data and ML tasks;
    • train the selected ML model on the preprocessed dataset;
    • test the selected ML model on the preprocessed dataset.
    • Please, consult the lectures of Weeks #1 and #2 for the specifics of the different ML models, the Data Preprocessing staeps and the specifics of Model Training and Testing.

    Target: In this task you are asked to predict the Bond Price.

    # Load packages
    import numpy as np 
    import pandas as pd 
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import r2_score
    import matplotlib.pyplot as plt
    import seaborn as sns
    #Determine the ML Model: Supervised-Classification, Supervised-Regression, or Unsupervised ML Model
    import pandas as pd
    df = pd.read_csv("Trade.csv")
    df.head(10)

    This is a supervised-regression problem. The "Bond Price" is a continuous numerical value that we want to predict based on the available data. Therefore, we would use a supervised regression machine learning model to make predictions for the bond prices.

    #Step 2.1: Specifics of the dataset
    df.shape
    df.columns
    # Count the number of columns in df
    column_count = len(df.columns)
    column_count
    df.dtypes
    # Calculate statistics for the 'Price' feature
    df['Price'].describe()
    df.count()
    # Plot the 'Price' feature
    plt.plot(df['Price'])
    plt.xlabel('Index')
    plt.ylabel('Price')
    plt.title('Price Feature')
    plt.show()
    #Show the features with unique values for each row in the dataset
    unique_features = df.nunique()
    unique_features
    # Drop rows with NaN values
    df.dropna(inplace=True)
    
    # Remove features with unique values for each row
    unique_features = df.drop_duplicates()
    unique_features