Workspace
Ricka Johan/

Examining Factors that cause Heart Attacks

0
Beta
Spinner

Examining Factors Responsible for Heart Attacks

Objective:


Cardiovascular diseases are the leading cause of death globally. This analysis aims to identify the leading factors of Cardiovascular Diseases, using Logistic Regression model to predict the outcome of the test data. Lastly, we will use an confusion matris and report the model performance with Recall, Precision and Accuracy assessment.

Variable Descriptions:

age: age in years
sex: (1 = male; 0 = female)
cp: chest pain type

  • Value 0: typical angina
  • Value 1: atypical angina
  • Value 2: non-anginal pain
  • Value 3: asymptomatic

trestbps: resting blood pressure (in mm Hg)
chol: serum cholestoral in mg/dl
fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
restecg: resting electrocardiographic results

  • Value 0: normal
  • Value 1: having ST-T wave abnormality (T wave inversions and/or-elevation or ST depression of > 0.05 mV)
  • Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria

thalach: maximum heart rate achieved
exang: exercise induced angina (1 = yes; 0 = no)
oldpeak: ST depression induced by exercise relative to rest
slope: the slope of the peak exercise ST segment
ca: number of major vessels (0-3) colored by flourosopy
thal: thalassemia types:

  • thal value 0 = Silent carrier
  • thal value 1 = Mild carrier
  • thal value 2 = Reverseable carrier
  • thal value 3 = Fixed defect carrier

target: 0= less chance of heart attack, 1= more chance of heart attack

1. Import Modules & Data

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

df = pd.read_excel('heart data.xlsx')
df.info()
  • Dataset has 14 columns and 303 rows (inc header)
  • There appears to be no missing values
  • All columns contain int64 or float64 datatypes
df.shape

2. Data Wrangling

# check missing values
df.isnull().sum()
# check for duplicates
df.duplicated().any()
# drop duplicates and keep first occurance
df.drop_duplicates(keep='first', inplace=True)
df.reset_index(drop=True, inplace=True)



  • AI Chat
  • Code