Skip to content
The Android App Market on Google Play
  • AI Chat
  • Code
  • Report
  • Spinner

    1. Introduction

    Google Play logo

    Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market[1].

    The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.

    The dataset you will use here was scraped from Google Play Store in September 2018 and was published on Kaggle. Here are the details:

    datasets/apps.csv
    This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.
    • App: Name of the app
    • Category: Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.
    • Rating: The current average rating (out of 5) of the app on Google Play
    • Reviews: Number of user reviews given on the app
    • Size: Size of the app in MB (megabytes)
    • Installs: Number of times the app was downloaded from Google Play
    • Type: Whether the app is paid or free
    • Price: Price of the app in US$
    • Last Updated: Date on which the app was last updated on Google Play
    datasets/user_reviews.csv
    This file contains a random sample of 100 [most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/) user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.
    • App: Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file
    • Review: The pre-processed user review text
    • Sentiment Category: Sentiment category of the user review - Positive, Negative or Neutral
    • Sentiment Score: Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.

    From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.

    The three questions are:

    1. Read the apps.csv file and clean the Installs column to convert it into integer data type. Save your answer as a DataFrame apps.

    2. Find the number of apps in each category, the average price, and the average rating. Save your answer as a DataFrame app_category_info. You should rename the four columns as: Category, Number of apps, Average price, Average rating.

    3. Find the top 10 free FINANCE apps having the highest average sentiment score. Save your answer as a DataFrame top_10_user_feedback. Your answer should have exactly 10 rows and two columns named: App and Sentiment Score, where the average Sentiment Score is sorted from highest to lowest.

    #importing pandas and explore apps datasets/app.csv
    import pandas as pd
    
    apps = pd.read_csv('datasets/apps.csv')
    apps.head()
    
    #read and explore apps datasets/app.csv
    user_reviews = pd.read_csv('datasets/user_reviews.csv')
    
    user_reviews.head()
    #Remove non-numerical charecter from the column 'Install' e convert it in integer data type.
    apps['Installs'] = apps['Installs'].str.replace(',','').str.replace('+','')
    apps['Installs'] = apps['Installs'].astype(int)
    apps.head()
    
    #ensuring 'Installs' column is now an integer data type
    apps['Installs'].dtype
    #Create a Dataframe with Average price, Average rating and numers of app per Category.
    
    app_category_info = apps.groupby('Category').agg(
            {'Category' : 'count',
              'Price' : 'mean',
              'Rating': 'mean'})
    
    #changing columns name
    app_category_info = app_category_info.rename(columns={
        'Category': 'Number of apps',
        'Price': 'Average price',
        'Rating': 'Average rating'
    }).reset_index()
    
    #explore few rows of the new dataframe
    app_category_info.head()
    #creating a new df with  a list of free finance apps and explore it
    
    free_finance_apps = apps.query('Category =="FINANCE" and Type=="Free"')
    free_finance_apps.head()
    #merging free_finance_apps with user_review  
    
    free_finance_app_w_reviews = free_finance_apps.merge(user_reviews, on='App', how='left')
    free_finance_app_w_reviews.head()
    
    #finding the top 10 free Finance App with highest average sentiment
    
    top_10_user_feedback = (pd.DataFrame(free_finance_app_w_reviews.groupby('App')['Sentiment Score'].mean()))\
                                .sort_values('Sentiment Score', ascending = False).head(10)
    
    
    #I grouped the free_finance_app_w_reviews by 'App' to obtain the 'Sentiment Score' mean. After  that I sorted in descending order the 'Sentiment Score' column, end extracted the first 10 row with head()     
    
    top_10_user_feedback

    So the TOP 10 FINANCE APPS by Sentiment score, were the apps above.