this is the nav!
Workspace
Daryl Anthony Butron Cuayla/

# Logistic Regression Binary Classification

0
Beta

## .mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Logistic Regression Binary Classification

Logistic regression is a fundamental machine learning method originally from the field of statistics. It's a great choice for generating a baseline for any binary classification problem (meaning there are only two outcomes). This template trains and evaluates a logistic regression model for a binary classification problem. If you would like to learn more about logistic regression, take a look at DataCamp's Linear Classifiers in Python course.

To swap in your dataset in this template, the following is required:

• There's at least one feature column and a column with a binary categorical target variable you would like to predict.
• The features have been cleaned and preprocessed, including categorical encoding.
• There are no NaN/NA values. You can use this template to impute missing values if needed.

The placeholder dataset in this template consists of churn data from a telecom company. Each row represents a customer over a year and whether the customer churned (the target variable; `1` = yes, `0` = no). You can find more information on this dataset's source and dictionary here.

```.mfe-app-workspace-qcdhrn{font-size:13px;line-height:1.5384615384615385;font-family:JetBrainsMonoNL,Menlo,Monaco,'Courier New',monospace;}```# Load packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
accuracy_score,
confusion_matrix,
precision_score,
recall_score,
RocCurveDisplay,
)
from sklearn.model_selection import RandomizedSearchCV
from sklearn.preprocessing import StandardScaler

df``````
``````# Check if there are any null values
print(df.isnull().sum())``````
``````# Check columns to make sure you have feature(s) and a target variable
df.info()``````

#### 2. Splitting and standardizing the data

To split the data, we'll use the train_test_split() function. Then, we'll standardize the input data using `StandardScaler()` (note: this should be done after splitting the data to avoid data leakage). To learn more about standardizing data and preprocessing techniques, visit DataCamp's Preprocessing for Machine Learning in Python.

``````# Split the data into two DataFrames: X (features) and y (target variable)
X = df.iloc[:, 0:8]  # Specify at least one column as a feature
y = df["Churn"]  # Specify one column as the target variable

# Split the data into train and test subsets
# You can adjust the test size and random state
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.30, random_state=123
)

# Standardize X data based on X_train
sc = StandardScaler().fit(X_train)
X_train_scaled = sc.transform(X_train)
X_test_scaled = sc.transform(X_test)``````

#### 3. Building a logistic regression classifier

The following code builds a scikit-learn logistic regression classifier (`linear_model.LogisticRegression`) using the most fundamental parameters. As a reminder, you can learn more about these parameters in DataCamp's Linear Classifiers in Python course and scikit-learn's documentation.

``````from sklearn import preprocessing

# Define parameters: these will need to be tuned to prevent overfitting and underfitting
params = {
"penalty": "l2",  # Norm of the penalty: 'l1', 'l2', 'elasticnet', 'none'
"C": 1,  # Inverse of regularization strength, a positive float
"random_state": 123,
}

# Create a logistic regression classifier object with the parameters above
clf = LogisticRegression(**params)

# Train the classifer on the train set
clf = clf.fit(X_train_scaled, y_train)

# Predict the outcomes on the test set
y_pred = clf.predict(X_test_scaled)``````

To evaluate this classifier, we can calculate the accuracy, precision, and recall scores. You'll have to decide which performance metric is best suited for your problem and goal.

``````# Calculate the accuracy, precision, and recall scores
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))``````

#### 4. Other evaluation methods: confusion matrix and ROC curve

We can use a confusion matrix and a receiver operating characteristic (ROC) curve to get a fuller picture of the model's performance. These are available from sklearn's metrics module.

``````# Calculate confusion matrix
cnf_matrix = confusion_matrix(y_test, y_pred)

# Plot a labeled confusion matrix with Seaborn
sns.heatmap(cnf_matrix, annot=True, fmt="g")
plt.title("Confusion matrix")
plt.ylabel("Actual label")
plt.xlabel("Predicted label")``````