Beta
Table of Contents
The outline of your notebook will show up here. You can include headings in any text cell by starting a line with #
, ##
, ###
, etc., depending on the desired title hierarchy.
In this e-tivity, you are asked to follow the ML process to:
- preprocess a provided dataset, so it will become suitable for ML operations;
- select an appropriate ML model, so it will cope with the specifics of data and ML tasks;
- train the selected ML model on the preprocessed dataset;
- test the selected ML model on the preprocessed dataset.
- Please, consult the lectures of Weeks #1 and #2 for the specifics of the different ML models, the Data Preprocessing staeps and the specifics of Model Training and Testing.
Target: In this task you are asked to predict the Bond Price.
# Load packages
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt
import seaborn as sns
#Determine the ML Model: Supervised-Classification, Supervised-Regression, or Unsupervised ML Model
import pandas as pd
df = pd.read_csv("Trade.csv")
df.head(10)
This is a supervised-regression problem. The "Bond Price" is a continuous numerical value that we want to predict based on the available data. Therefore, we would use a supervised regression machine learning model to make predictions for the bond prices.
#Step 2.1: Specifics of the dataset
df.shape
df.columns
# Count the number of columns in df
column_count = len(df.columns)
column_count
df.dtypes
# Calculate statistics for the 'Price' feature
df['Price'].describe()
df.count()
# Plot the 'Price' feature
plt.plot(df['Price'])
plt.xlabel('Index')
plt.ylabel('Price')
plt.title('Price Feature')
plt.show()
#Show the features with unique values for each row in the dataset
unique_features = df.nunique()
unique_features
# Drop rows with NaN values
df.dropna(inplace=True)
# Remove features with unique values for each row
unique_features = df.drop_duplicates()
unique_features