Machine Learning - E-activity

Beta

In this e-tivity, you are asked to follow the ML process to:

preprocess a provided dataset, so it will become suitable for ML operations;
select an appropriate ML model, so it will cope with the specifics of data and ML tasks;
train the selected ML model on the preprocessed dataset;
test the selected ML model on the preprocessed dataset.
Please, consult the lectures of Weeks #1 and #2 for the specifics of the different ML models, the Data Preprocessing staeps and the specifics of Model Training and Testing.

Target: In this task you are asked to predict the Bond Price.

# Load packages
import numpy as np 
import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt
import seaborn as sns

#Determine the ML Model: Supervised-Classification, Supervised-Regression, or Unsupervised ML Model
import pandas as pd
df = pd.read_csv("Trade.csv")
df.head(10)

This is a supervised-regression problem. The "Bond Price" is a continuous numerical value that we want to predict based on the available data. Therefore, we would use a supervised regression machine learning model to make predictions for the bond prices.

#Step 2.1: Specifics of the dataset
df.shape

df.columns

# Count the number of columns in df
column_count = len(df.columns)
column_count

df.dtypes

# Calculate statistics for the 'Price' feature
df['Price'].describe()

df.count()

# Plot the 'Price' feature
plt.plot(df['Price'])
plt.xlabel('Index')
plt.ylabel('Price')
plt.title('Price Feature')
plt.show()

#Show the features with unique values for each row in the dataset
unique_features = df.nunique()
unique_features

# Drop rows with NaN values
df.dropna(inplace=True)

# Remove features with unique values for each row
unique_features = df.drop_duplicates()
unique_features

‌
‌
‌