Skip to content
Beyond Saint Petersburg beer profile
  • AI Chat
  • Code
  • Report
  • Spinner
    library(tidyverse)
    library(skimr)
    library(cluster)
    data <- readr::read_csv('./data/russian_alcohol_consumption.csv')
    #skim(data)

    1. Executive summary

    Our latest promotion in Saint petersburg yields good outcomes. Running the same type of promotion in region having the similar selling history and characteristics could be of great benefit for the company.

    Based on the success on previous selling promotion results in Saint petersburg, this application intends to investigate possible regions where we could run additional promotion and expect similar outcomes or more.

    for (i in c(3:7)){
      data0 <-  data[,i]
      median0 <- unlist(map(data0,median,na.rm = TRUE))
      data0[is.na(data0)] <- median0
      data[,i] <- data0
    }
    #mean(is.na(data))
    data <- as_tibble(data)

    2. Saint peterburg profile

    st_pet <- data %>% 
      filter(str_detect(region,"[P|p]etersburg")) %>% 
      select(-c(year,region)) 
    
    cat("Average sale in litres per capita by year in Saint petersburg ")
    round(colMeans(st_pet),2)
    
    barplot(colMeans(st_pet), main = "Average sale in litres per capita by year in Saint petersburg ")
    On average, in Saint petersburg , beer has the highest mean value of alcoholic drinks sale in litres per capita : `r round(mean(st_pet$beer),2)`, followed by vodka `r round(mean(st_pet$vodka),2)` and wine `r round(mean(st_pet$wine),2)`.

    3. General profile

    barplot(colMeans(data[,3:7]),
            main = "Overall average sale in litres per capita by year ")

    The overall mean profile yields the same pattern, but is there any association between beer and vodka ?

    data %>% 
      ggplot(aes(vodka,beer )) +
      geom_point() +
      geom_smooth(method = "lm",se = F) + 
      #scale_y_log10() +
      labs(title = "scatter plot of beer vs vodka across all region")
    

    There is very weak linear association between beer and wine, with a correlation coefficient r = r round(cor(data$beer,data$vodka),2). Without individual product price to weight product effect on the company income, we consider beer as the most important or leading product to guide our intuition in investigations. Which region might be good candidates where to run promotion ?

    data %>% 
      #filter(!str_detect(region,"^[S|s]aint" )) %>% 
      group_by(region) %>% 
      summarize(wine = median(wine) , 
                beer = median(beer) ,
                champagne = median(champagne) ,
                vodka = median(vodka) ,
                brandy = median(brandy) 
                ) -> data_sum
    #dim(data_sum)

    4. Intuitive region selection by beer profile

    Roughly speaking and considering only beer profile, the following 11 regions might have a similar beer profile.

    data_sum %>% 
      arrange(desc(beer)) %>% 
      distinct(region) %>% 
      head(11) 

    Selecting with descending vodka yields another list of regions .