Skip to content
1 hidden cell
Machine Learning - E-activity
In this e-tivity, you are asked to follow the ML process to:
- preprocess a provided dataset, so it will become suitable for ML operations;
- select an appropriate ML model, so it will cope with the specifics of data and ML tasks;
- train the selected ML model on the preprocessed dataset;
- test the selected ML model on the preprocessed dataset.
- Please, consult the lectures of Weeks #1 and #2 for the specifics of the different ML models, the Data Preprocessing staeps and the specifics of Model Training and Testing.
Target: In this task you are asked to predict the Bond Price.
# Load packages
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Determine the ML Model: Supervised-Classification, Supervised-Regression, or Unsupervised ML Model
df = pd.read_csv("Trade.csv")
df.head(10)This is a supervised-regression problem. The "Bond Price" is a continuous numerical value that we want to predict based on the available data. Therefore, we would use a supervised regression machine learning model to make predictions for the bond prices.
1 hidden cell
#Step 2.1: Specifics of the dataset
df.shapedf.columns# Count the number of columns in df
column_count = len(df.columns)
column_countdf.dtypes# Calculate statistics for the 'Price' feature
df['Price'].describe()df.count()# Plot the 'Price' feature
plt.plot(df['Price'])
plt.xlabel('Index')
plt.ylabel('Price')
plt.title('Price Feature')
plt.show()#Show the features with unique values for each row in the dataset
unique_features = df.nunique()
unique_features# Drop rows with NaN values
df.dropna(inplace=True)
# Remove features with unique values for each row
unique_features = df.drop_duplicates()
unique_features