Skip to content
generalising concrete composition and strength
  • AI Chat
  • Code
  • Report
  • Spinner

    Can you predict the strength of concrete?

    📖 Background

    You work in the civil engineering department of a major university. You are part of a project testing the strength of concrete samples.

    Concrete is the most widely used building material in the world. It is a mix of cement and water with gravel and sand. It can also include other materials like fly ash, blast furnace slag, and additives.

    The compressive strength of concrete is a function of components and age, so your team is testing different combinations of ingredients at different time intervals.

    The project leader asked you to find a simple way to estimate strength so that students can predict how a particular sample is expected to perform.

    💾 The data

    The team has already tested more than a thousand samples (source):

    Compressive strength data:
    • "cement" - Portland cement in kg/m3
    • "slag" - Blast furnace slag in kg/m3
    • "fly_ash" - Fly ash in kg/m3
    • "water" - Water in liters/m3
    • "superplasticizer" - Superplasticizer additive in kg/m3
    • "coarse_aggregate" - Coarse aggregate (gravel) in kg/m3
    • "fine_aggregate" - Fine aggregate (sand) in kg/m3
    • "age" - Age of the sample in days
    • "strength" - Concrete compressive strength in megapascals (MPa)

    Acknowledgments: I-Cheng Yeh, "Modeling of strength of high-performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998).

    Executive summary

    We have been instructed to analyze data on precise days, i.e. 1,7,14 and 28 day to find a simple way to estimate strength with respect to features.

    #libraries
    # suppress messages
    suppressPackageStartupMessages(
    {
        library(tidyverse)
        library(broom)
        }
     )

    Load data

    # load data 
    df <- readr::read_csv('data/concrete_data.csv', show_col_types = FALSE)
    head(df,3)
    tail(df,3)

    1D EDA

    Dimensions

    # dim df and proportion of NA's.
    list(dim = dim(df),
    
    # prop na's
    prop_na = mean(is.na(df))
              )

    There is no NA in the data.

    # segment data by age.
    df <- df %>%
        filter(age %in% c(1,7,14,28)) %>%
        group_by(age) 

    How many samples are there recorded on each day ?

    # count recorded samples on different selected days
    df %>% 
        count(age)

    Age has different number of samples recorded during given days.

    • 2 samples on day 1
    • 126 samples on day 7
    • 62 sample on day 14
    • 425 samples on day 28

    What is the mean strength on days 1,7,14 and 28 ?

    # mean strength on given days 
    mean_strength_df <- df %>%
        summarize( mean_strength = mean(strength, na.rm = TRUE))
    mean_strength_df
    ‌
    ‌
    ‌