Skip to content

In this e-tivity, you are asked to follow the ML process to:

  • preprocess a provided dataset, so it will become suitable for ML operations;
  • select an appropriate ML model, so it will cope with the specifics of data and ML tasks;
  • train the selected ML model on the preprocessed dataset;
  • test the selected ML model on the preprocessed dataset.
  • Please, consult the lectures of Weeks #1 and #2 for the specifics of the different ML models, the Data Preprocessing staeps and the specifics of Model Training and Testing.

Target: In this task you are asked to predict the Bond Price.

# Load packages
import numpy as np 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Determine the ML Model: Supervised-Classification, Supervised-Regression, or Unsupervised ML Model
df = pd.read_csv("Trade.csv")
df.head(10)

This is a supervised-regression problem. The "Bond Price" is a continuous numerical value that we want to predict based on the available data. Therefore, we would use a supervised regression machine learning model to make predictions for the bond prices.


1 hidden cell
#Step 2.1: Specifics of the dataset
df.shape
df.columns
# Count the number of columns in df
column_count = len(df.columns)
column_count
df.dtypes
# Calculate statistics for the 'Price' feature
df['Price'].describe()
df.count()
# Plot the 'Price' feature
plt.plot(df['Price'])
plt.xlabel('Index')
plt.ylabel('Price')
plt.title('Price Feature')
plt.show()
#Show the features with unique values for each row in the dataset
unique_features = df.nunique()
unique_features
# Drop rows with NaN values
df.dropna(inplace=True)
# Remove features with unique values for each row
unique_features = df.drop_duplicates()
unique_features