this is the nav!
Workspace
Abdulazeez Saliu/

# Concrete Strength Predictor

0
Beta

### .mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;} What is Concrete?

Concrete is the most widely used building material in the world. It is a mix of cement and water with gravel and sand. It can also include other materials like fly ash, blast furnace slag, and additives.Concrete has been used since the time of the ancient Romans and as gone through several modifications through the decade.Such modifications come about from statisitcal analysis on the mix ratio and resulting concrete strength. In the notebook, we are going to anaylize the experimental results of thousands of samples of concrete with the aim of developing a model that can predict the strength of concrete by inputing the obtained coeficients.

Lets start by taking a look at our dataframe

```.mfe-app-workspace-qcdhrn{font-size:13px;line-height:1.5384615384615385;font-family:JetBrainsMonoNL,Menlo,Monaco,'Courier New',monospace;}```import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
display(conc)``````

The dataframe reveals the mixture of cement, slah, fly ash,water,superplasticizer,coarse_aggregate,fine aggregate in different proportions and tested at differnt days recorded in the age colunm to obtain the strength recorded in the strength colunm.

### Checking for Data frame issues?

let us check for data frame issues like missing values or duplicated values and perform a data cleaning exercise if necessary

``````conc.info()
conc.describe()
conc[conc.duplicated()]
conc.drop_duplicates(inplace=True)
conc.shape``````

The dataframe had some duplicated rows which have been removed and have now reduced the unqiue row numbers from 1030 to 1005.

### What gives Concrete its strength?

There is a nunmber of different combination of variables that leads to the strength of concrete. There is an old saying "Age like fine wine" which translates to the older you get, the better you become, lets visualize the concrete strength as agaisnt the age to put the theory to test.

``````conc_age_group=conc.groupby('age')['strength'].mean().round(2).reset_index()
display(conc_age_group)
figure, ax= plt.subplots()
sns.regplot(data=conc_age_group,x='age',y='strength',order=2,ci=0)
plt.show()``````

The strength of the concrete increased averagely from day 1 to day 365 but the strength gain flattend from day 56 and in some samples reduced.The reduction in the strength migth be due to other factors which we would find out later.

### Age Distribution of The concrete samples

The concrete samples were tested at different ages ranging from 1-365 days, lets visualizes the age distribution of the concrete samples.

``````count=(conc.groupby('age')['strength']\
.agg(['mean','count'])).round(2).reset_index()\
.rename(columns={'mean':'Average Strength'})
display(count.sort_values(by='count',ascending=False))
figure, ax = plt.subplots()
sns.countplot(x='age',data=conc);

plt.title(label="Distrubution of concrete age groups")
plt.xlabel("Age (Days)")
plt.ylabel("Count of samples")
plt.show()``````

From the table and graph above, we can see most of the samples are between the 1-100 days age group, with the strength increasing as the concrete gets older.

### Base Regression Model

Let us define our target(Y) variables and features(X) and train our first regression model.

``````
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
X=conc.iloc[:,0:-1]
y=pd.DataFrame(conc.iloc[:,-1])
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42,stratify=X.age)
reg=LinearRegression()
reg.fit(X_train,y_train)
y_pred=reg.predict(X_test)
print("The accuracy of the model(r^2) is : ",reg.score(X_test,y_test).round(2))
coef=pd.DataFrame({'Materials':list(X.columns),'Coef':reg.coef_.flatten()})
fig,ax=plt.subplots()
# plt.scatter(y_test,y_pred,alpha=0.7, edgecolors="k")
sns.barplot(data=coef,x='Materials',y='Coef')
plt.xticks(rotation=45)
plt.show()``````

The model has an r squared value of 0.62 meaning our model can onlt explain about 62% of the variability in the dataset. The superplasticizer has the highest weight of the features and water having the lowest weight of the feautrues.

``````# Import the necessary modules
from sklearn.model_selection import cross_val_score, KFold

# Create a KFold object
kf = KFold(n_splits=6, shuffle=True, random_state=5)

reg = LinearRegression()

# Compute 6-fold cross-validation scores
cv_scores = cross_val_score(reg, X, y, cv=kf)

# Print scores
print(cv_scores.mean().round(2))``````
``````# Import Ridge
from sklearn.linear_model import Ridge
alphas = [0.1, 1.0, 10.0, 100.0, 1000.0, 10000.0]
ridge_scores = []
for alpha in alphas:

# Create a Ridge regression model
ridge = Ridge(alpha=alpha)

# Fit the data
ridge.fit(X_train, y_train)

# Obtain R-squared
score = ridge.score(X_test, y_test)
ridge_scores.append(score)
print(ridge_scores)``````

### Regression Model with scaled features

Let us scale our feautures to even the playing field for all feautrues

``````from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
reg=LinearRegression()
reg.fit(X_train_scaled,y_train)
y_pred=reg.predict(X_test_scaled)
print("The accuracy of the model(r^2) is : ",reg.score(X_test_scaled,y_test).round(2))
coef=pd.DataFrame({'Materials':list(X.columns),'Coef':reg.coef_.flatten()})
fig,ax=plt.subplots()
# plt.scatter(y_test,y_pred,alpha=0.7, edgecolors="k")
sns.barplot(data=coef,x='Materials',y='Coef')
plt.xticks(rotation=45)
plt.show()``````

There is no difference between the accuracy of the scaled features and that of the unscaled features.

### Regression Model with encoded Age Feautrues

The age category of the model can be represented as a categorical feauture using one hot encoding representing the values with onces and zeroes

``````x_dummies=pd.get_dummies(X['age'],drop_first=True)
X_dummies=X.drop(columns='age',axis=1)
X_dummies=pd.concat([X_dummies,x_dummies],axis=1)
X_train,X_test,y_train,y_test=train_test_split(X_dummies,y,test_size=0.2,random_state=42)
reg.fit(X_train,y_train)
y_pred=reg.predict(X_test)
print("The accuracy of the model(r^2) is : ",reg.score(X_test,y_test).round(2))
coef=pd.DataFrame({'Materials':list(X_train.columns),'Coef':reg.coef_.flatten()})