The Android App Market on Google Play

1. Introduction

Google Play logo

Mobile apps are everywhere. They are easy to create and can be very lucrative from the business standpoint. Specifically, Android is expanding as an operating system and has captured more than 74% of the total market^[1].

The Google Play Store apps data has enormous potential to facilitate data-driven decisions and insights for businesses. In this notebook, we will analyze the Android app market by comparing ~10k apps in Google Play across different categories. We will also use the user reviews to draw a qualitative comparision between the apps.

The dataset you will use here was scraped from Google Play Store in September 2018 and was published on Kaggle. Here are the details:

datasets/apps.csv

This file contains all the details of the apps on Google Play. There are 9 features that describe a given app.

App: Name of the app
Category: Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.
Rating: The current average rating (out of 5) of the app on Google Play
Reviews: Number of user reviews given on the app
Size: Size of the app in MB (megabytes)
Installs: Number of times the app was downloaded from Google Play
Type: Whether the app is paid or free
Price: Price of the app in US$
Last Updated: Date on which the app was last updated on Google Play

datasets/user_reviews.csv

This file contains a random sample of 100 [most helpful first](https://www.androidpolice.com/2019/01/21/google-play-stores-redesigned-ratings-and-reviews-section-lets-you-easily-filter-by-star-rating/) user reviews for each app. The text in each review has been pre-processed and passed through a sentiment analyzer.

App: Name of the app on which the user review was provided. Matches the `App` column of the `apps.csv` file
Review: The pre-processed user review text
Sentiment Category: Sentiment category of the user review - Positive, Negative or Neutral
Sentiment Score: Sentiment score of the user review. It lies between [-1,1]. A higher score denotes a more positive sentiment.

From here on, it will be your task to explore and manipulate the data until you are able to answer the three questions described in the instructions panel.

The three questions are:

Read the apps.csv file and clean the Installs column to convert it into integer data type. Save your answer as a DataFrame apps.
Find the number of apps in each category, the average price, and the average rating. Save your answer as a DataFrame app_category_info. You should rename the four columns as: Category, Number of apps, Average price, Average rating.
Find the top 10 free FINANCE apps having the highest average sentiment score. Save your answer as a DataFrame top_10_user_feedback. Your answer should have exactly 10 rows and two columns named: App and Sentiment Score, where the average Sentiment Score is sorted from highest to lowest.

#importing pandas and explore apps datasets/app.csv
import pandas as pd

apps = pd.read_csv('datasets/apps.csv')
apps.head()

#read and explore apps datasets/app.csv
user_reviews = pd.read_csv('datasets/user_reviews.csv')

user_reviews.head()

#Remove non-numerical charecter from the column 'Install' e convert it in integer data type.
apps['Installs'] = apps['Installs'].str.replace(',','').str.replace('+','')
apps['Installs'] = apps['Installs'].astype(int)
apps.head()

#ensuring 'Installs' column is now an integer data type
apps['Installs'].dtype

#Create a Dataframe with Average price, Average rating and numers of app per Category.

app_category_info = apps.groupby('Category').agg(
        {'Category' : 'count',
          'Price' : 'mean',
          'Rating': 'mean'})

#changing columns name
app_category_info = app_category_info.rename(columns={
    'Category': 'Number of apps',
    'Price': 'Average price',
    'Rating': 'Average rating'
}).reset_index()

#explore few rows of the new dataframe
app_category_info.head()

#creating a new df with  a list of free finance apps and explore it

free_finance_apps = apps.query('Category =="FINANCE" and Type=="Free"')
free_finance_apps.head()

#merging free_finance_apps with user_review  

free_finance_app_w_reviews = free_finance_apps.merge(user_reviews, on='App', how='left')
free_finance_app_w_reviews.head()

#finding the top 10 free Finance App with highest average sentiment

top_10_user_feedback = (pd.DataFrame(free_finance_app_w_reviews.groupby('App')['Sentiment Score'].mean()))\
                            .sort_values('Sentiment Score', ascending = False).head(10)


#I grouped the free_finance_app_w_reviews by 'App' to obtain the 'Sentiment Score' mean. After  that I sorted in descending order the 'Sentiment Score' column, end extracted the first 10 row with head()     

top_10_user_feedback

So the TOP 10 FINANCE APPS by Sentiment score, were the apps above.

The Android App Market on Google Play

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}1. Introduction

1. Introduction