Data-Driven Merchandise Sales Strategy: Exploring Pokemon Choices (R)
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Data-Driven Merchandise Sales Strategy: Exploring Pokemon Choices

    Scenario: Business Task and Objectives

    I am a junior data analyst working for a business intelligence consultant. I have been assigned a project for a new client, a retail store specializing in anime and game merchandise. My task is to conduct a data-driven analysis to provide recommendations for curating a compelling lineup of Pokemon merchandise to kickstart the retail store's offerings. The objective is to optimize the selection of Pokemon plushies that will resonate with customers, align with their preferences, and maximize sales potential.

    To achieve this, my analysis will consider various factors, including:

    1. Analyzing Ash's Journey and Pokemon Win Rates: By examining Ash's journey in the Pokemon animated series and games, I will identify the Pokemon that have been consistently used by Ash and have achieved high win rates in battles. These Pokemon hold significant popularity and can serve as strong candidates for inclusion in the merchandise lineup.
    2. Investigating Reddit Survey Data: I will explore data from a Reddit survey that captures the favorite Pokemon of Reddit users. This survey provides valuable insights into the preferences and beloved Pokemon among online communities. Analyzing the survey results will help identify Pokemon that garner significant attention and are highly regarded by fans.
    3. Examining Japan Survey Data for Pokemon Scarlet/Violet: Additionally, I will investigate data from a Japan survey specifically focused on the favorite Pokemon of players in the Pokemon Scarlet/Violet games. This survey provides insights into the preferences and favorite Pokemon of the Japanese market, which is a significant audience for Pokemon merchandise. Incorporating these findings will help ensure that the pokemon lineup appeals to the Japanese customer base.
    4. Creating a Ranking of Popularity: A key aspect of this project is to create a comprehensive ranking of Pokemon popularity, this will be derived from the analysis of the datasets obtained from the previous section. By incorporating this fourth objective, we ensure a holistic approach to our analysis, encompassing various factors and capturing the evolving preferences and popularity of Pokemon within different contexts.

    Data Sources

    The analysis will involve the following datasets:

    • Ash's Pokemon Win Percentages: Kanto-Alola (2020)

      • This dataset records pokemon captured and used by Ash Ketchup in battles, including both official and unofficial battles, up to gen 7.
      • Source: Ash Pokemon rank by win percentage

    • Complete Pokemon dataset (2022)

      • This comprehensive dataset contains information on pokemon up to generation 9 and was compiled from pokemondb.net.
      • Source: Kaggle dataset

    • Reddit's Favourite Pokemon Survey (2019)


    • Japan's Favorite Pokemon Survey (2022)

      • This dataset comprises the results of a survey conducted in Japan, highlighting the favorite pokemon among 177,829 respondents.
      • Source: Favorite Pokémon in 2022

    In the 'ash_pokemon_kanto-alola.csv' file, the following information is available:

    • name: The name of each pokemon.
    • battles: the total number of battles for each pokemon.
    • wins: The number of battles won by each pokemon.
    • losses: The number of battles lost by each pokemon.

    From the 'Pokedex_Cleaned.csv' file, I will import the following information:

    • type: The pokemon type for each entry.

    From the 'reddit_survey_data.csv' file:

    • Pokemon: The name of each pokemon.
    • Number of votes: The number of votes received by each pokemon in the survey.

    Lastly, from the 'japan_survey_data.csv' file:

    • name: The name of each pokemon.
    • votes: The number of votes received by each pokemon in the survey.

    Data Preparation and Transformation

    In this section, several actions were performed to prepare and transform the data for further analysis and recommendation generation. These actions included:

    1. Cleaning and Standardization: I focused on ensuring data integrity and accuracy by checking missing values, handling duplicate entries, and resolving inconsistencies within the dataset.
    2. Feature Engineering: I derived new features from the Ash Pokemon dataset, including win percentage and a variety of ranks based off different datasets.
    3. Data Integration: Incorporated the pokemon type data from the pokedex dataset into each respective dataset, to enhance the visualization capabilities.
    4. Color Palette Setup: I established a color palette based on Pokemon types. This palette will be utilized in the visualization created in a later section, enhancing the visual representation of Pokemon attributes.
    5. Rank-Based Dataset Creation and Recommendation List: I created a comprehensive dataset that incorporates rankings based on the number of battles, win percentage, Reddit survey votes, and Japan survey votes for each Pokemon. Using this rank-based dataset, I generated a recommendation list that highlights the top-performing Pokemon across various categories. This approach ensures that our recommendations are based on objective criteria and provide valuable insights.
    # Importing libraries
    library(tidyverse)
    
    # Loading Ash pokemon data
    data_raw <- read.csv("data/ash_pokemon_kanto-alola.csv")
    head(data_raw)
    
    # Loading Pokedex data
    pokedex_data <- read.csv("data/Pokedex_Cleaned.csv", skip = 1, header = FALSE, stringsAsFactors = FALSE)
    
    # Exclude the first column
    pokedex_data <- pokedex_data[, -1]
    # Remove the additional columns
    pokedex_data <- pokedex_data[, 1:2]
    
    # Rename columns
    colnames(pokedex_data) <- c("name", "type_1")
    
    # Print the modified dataset
    head(pokedex_data)
    
    # Loading Reddit Survey data
    reddit_survey_data <- read.csv("data/reddit_survey_data.csv", skip = 1, header = TRUE)
    
    # Rename columns and drop the ones I don't need
    colnames(reddit_survey_data) <- c("name", "votes")
    reddit_survey_data <- subset(reddit_survey_data, select = c("name", "votes"))
    head(reddit_survey_data)
    
    # Loading Japan Survey data
    japan_survey_data <- read.csv("data/japan_survey_data.csv")
    head(japan_survey_data);
    # Removing duplicate data
    pokedex_data <- pokedex_data[!duplicated(pokedex_data$name), ]
    
    # Merging data
    merged_pokemon <- left_join(data_raw, pokedex_data, by= "name") %>%
    	select(name, battles, wins, losses, type_1)
    
    # Data preview
    head(merged_pokemon)
    # Adding another column
    ash_pokemon <- merged_pokemon %>% mutate(win_pct = wins / battles)
    
    # Renaming column type_1 to type
    ash_pokemon <- ash_pokemon %>% rename(type = type_1)
    head(ash_pokemon)
    # Check for missing values in each column for all Pokemon
    pokemon_with_missing <- apply(ash_pokemon, 1, function(row) any(is.na(row)))
    
    # Subset the dataset to show Pokemon with missing values
    pokemon_missing_values <- ash_pokemon[pokemon_with_missing, ]
    print(pokemon_missing_values)
    # Setting up color palette based on pokemon types
    type_colors <- c("Bug" = "#A8B820", "Dark" = "#705848", "Dragon" = "#7038F8", 
                     "Electric" = "#F8D030", "Fairy" = "#EE99AC", "Fighting" = "#C03028", 
                     "Fire" = "#F08030", "Flying" = "#A890F0", "Ghost" = "#705898", 
                     "Grass" = "#78C850", "Ground" = "#E0C068", "Ice" = "#98D8D8", 
                     "Normal" = "#A8A878", "Poison" = "#A040A0", "Psychic" = "#F85888", 
                     "Rock" = "#B8A038", "Steel" = "#B8B8D0", "Water" = "#6890F0")
    # Create a new dataset with ranks for battles and win percentage
    new_dataset <- data.frame(
      name = ash_pokemon$name,
      battle_rank = rank(-ash_pokemon$battles, na.last = "keep"),
      win_pct_rank = rank(-ash_pokemon$win_pct, na.last = "keep")
    )
    # Creating rank for the reddit survey data 
    reddit_survey_data$votes_rank <- rank(-reddit_survey_data$votes, na.last = "keep")
    
    # Merging the new dataset with the reddit survey results
    new_dataset <- merge(new_dataset, reddit_survey_data[, c("name", "votes_rank")], by = "name", all = TRUE)
    
    # Creating rank for the japan survey data
    japan_survey_data$japan_votes_rank <- rank(-japan_survey_data$votes, na.last = "keep")
    
    # Merging the new dataset with the japan survey results
    new_dataset <- merge(new_dataset, japan_survey_data[, c("name", "japan_votes_rank")], by = "name", all = TRUE)
    
    # Print the new dataset with rank
    head(new_dataset)
    # Calculate the ranking scores for battles
    battle_score <- ifelse(is.na(new_dataset$battle_rank), 0,
                           ifelse(new_dataset$battle_rank <= 5, 50,
                                  ifelse(new_dataset$battle_rank <= 10, 40,
                                         ifelse(new_dataset$battle_rank <= 20, 30,
                                                ifelse(new_dataset$battle_rank <= 50, 20,
                                                       ifelse(new_dataset$battle_rank <= 100, 10, 5))))))
    
    # Calculate the ranking scores for win percentage
    win_pct_score <- ifelse(is.na(new_dataset$win_pct_rank), 0,
                             ifelse(new_dataset$win_pct_rank <= 5, 50,
                                    ifelse(new_dataset$win_pct_rank <= 10, 40,
                                           ifelse(new_dataset$win_pct_rank <= 20, 30,
                                                  ifelse(new_dataset$win_pct_rank <= 30, 20,
                                                         ifelse(new_dataset$win_pct_rank <= 100, 10, 5))))))
    
    # Calculate the ranking scores for Reddit survey
    reddit_score <- ifelse(is.na(new_dataset$votes_rank), 0,
                            ifelse(new_dataset$votes_rank <= 5, 50,
                                   ifelse(new_dataset$votes_rank <= 10, 40,
                                          ifelse(new_dataset$votes_rank <= 20, 30,
                                                 ifelse(new_dataset$votes_rank <= 50, 20,
                                                        ifelse(new_dataset$votes_rank <= 100, 10, 5))))))
    
    # Calculate the ranking scores for Japan survey
    japan_score <- ifelse(is.na(new_dataset$japan_votes_rank), 0,
                           ifelse(new_dataset$japan_votes_rank == 1, 70,
                                  ifelse(new_dataset$japan_votes_rank <= 3, 65,
                                         ifelse(new_dataset$japan_votes_rank <= 5, 60,
                                                ifelse(new_dataset$japan_votes_rank <= 10, 50,
                                                       ifelse(new_dataset$japan_votes_rank <= 20, 55,
                                                              ifelse(new_dataset$japan_votes_rank <= 30, 52,
                                                                     ifelse(new_dataset$japan_votes_rank <= 50, 51,
                                                                            ifelse(new_dataset$japan_votes_rank <= 100, 1, 0)))))))))
    
    # Calculate the aggregated score
    aggregated_score <- battle_score + win_pct_score + reddit_score + japan_score
    
    # Combine Pokémon data with aggregated score
    pokemon_scores <- data.frame(name = new_dataset$name, score = aggregated_score)
    
    # Sort the Pokémon based on the aggregated score
    sorted_pokemon <- pokemon_scores[order(pokemon_scores$score, decreasing = TRUE), ]
    
    # Select the top-ranked Pokémon as recommendations
    top_recommendations <- head(sorted_pokemon, 25) # <-- Modify this number to modify the recommendations table
    # Initialize an empty data frame to store the details
    pokemon_details <- data.frame()
    
    # Iterate over each Pokemon in the top_recommendations list
    for (pokemon in top_recommendations$name) {
      # Get the details for the current Pokemon
      pokemon_details <- rbind(pokemon_details, new_dataset[new_dataset$name == pokemon, ])}
    
    # Create a new data frame with the desired columns
    pokemon_table <- data.frame(
    	Rank = 1:nrow(pokemon_details),
      pokemon_name = pokemon_details$name,
      battle_rank = ifelse(!is.na(pokemon_details$battle_rank), paste0("#", pokemon_details$battle_rank), "-"),
      win_pct_rank = ifelse(!is.na(pokemon_details$win_pct_rank), paste0("#", pokemon_details$win_pct_rank), "-"),
      reddit_survey = ifelse(!is.na(pokemon_details$votes_rank), paste0("#", pokemon_details$votes_rank), "-"),
      japan_survey = ifelse(!is.na(pokemon_details$japan_votes_rank), 
                                         paste0("#", pokemon_details$japan_votes_rank), 
                                         "-"),
      stringsAsFactors = FALSE)

    Supporting Visualizations and Key Findings

    In this section, the objective is to delve into the data and gain insights into the top-performing Pokemon across different categories and datasets.

    # Plotting Top 15 Ash pokemon by win percentage
    
    # Preparing the data
    top_win_pct <- ash_pokemon %>%
      select(name, battles, wins, win_pct, type) %>%
      arrange(desc(win_pct)) %>%
      top_n(15, win_pct)
    
    # Plotting the data
    ggplot(top_win_pct, aes(x=win_pct, y=reorder(name, win_pct), fill=type)) +
      geom_bar(stat= 'identity') +
      geom_text(aes(label=paste0(round(win_pct*100,1), "%")), hjust = -0.1, size = 3, position = position_stack(vjust = 0.5)) +
      scale_x_continuous(limits = c(0, 1), expand = c(0, 0), labels = scales::percent_format(accuracy = 1)) +
      xlab('Win percentage') +
      ylab('Pokemon') +
      ggtitle('Top 15 pokemon by win percentage') +
      scale_fill_manual(values = type_colors) +
      theme(axis.text.x = element_text(size = 10), axis.text.y = element_text(size = 8), plot.title = element_text(hjust = 0.5))