Skip to content

Sleep Health and Lifestyle

This synthetic dataset contains sleep and cardiovascular metrics as well as lifestyle factors of close to 400 fictive persons.

The workspace is set up with one CSV file, data.csv, with the following columns:

  • Person ID
  • Gender
  • Age
  • Occupation
  • Sleep Duration: Average number of hours of sleep per day
  • Quality of Sleep: A subjective rating on a 1-10 scale
  • Physical Activity Level: Average number of minutes the person engages in physical activity daily
  • Stress Level: A subjective rating on a 1-10 scale
  • BMI Category
  • Blood Pressure: Indicated as systolic pressure over diastolic pressure
  • Heart Rate: In beats per minute
  • Daily Steps
  • Sleep Disorder: One of None, Insomnia or Sleep Apnea

Source: Kaggle

Hidden code
Hidden code
Hidden code

In general the data shows that there arent many outliers in different varibles with the exception of Heart rate, that has some outliers. With this in consideration it doesnt seem necesary to drop these values, considering the limited amount of data and the fat that these data probablpy is authentic

Hidden code

It can be observed that there are no null values and the data type matches the column, so there is no need for adjustment

Hidden code

First we need to split the data into the target variable and the features variables. Also we need to split the data between training and testing sets

# Split the data into features and target variable
X = sleep_data.drop('Sleep Disorder', axis=1)
y = sleep_data['Sleep Disorder']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Logistic Regression

To perform logistic regression on the sleep_data dataset, we can use the LogisticRegression class from the sklearn.linear_model module.

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Create an instance of the LogisticRegression model
model = LogisticRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Predict the target variable for the test data
y_pred_log = model.predict(X_test)

Decision Tree

To build a decision tree model for the sleep_data dataset, we can use the DecisionTreeClassifier class from the sklearn.tree module.

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Create an instance of the DecisionTreeClassifier model
model = DecisionTreeClassifier()

# Fit the model to the training data
model.fit(X_train, y_train)

# Predict the target variable for the test data
y_pred_tree = model.predict(X_test)

Neural Network

To build a neural network model for the sleep_data dataset, we can use the MLPClassifier class from the sklearn.neural_network module.