Predicting Google Apps Sentiment Score with Naïve Beyes

Beta

Predicting Google Apps Sentiment with Naïve Beyes

This dataset consists of web scraped data of 60,000 app reviews, including the text of the review and sentiment scores. We will try to predict sentiment based on the text through a Bayesian model.

Data Dictionary

variable	class	description
App	character	The application name
Translated_Review	character	User review (translated to English)
Sentiment	character	The sentiment of the user - Positive/Negative/Neutral
Sentiment_Polarity	character	The sentiment polarity score
Sentiment_Subjectivity	character	The sentiment subjectivity score

Source of dataset.

# Modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score

# Seaborn parameters for data visualization
sns.set(rc={"figure.figsize":(15, 7)})
sns.set_context("notebook")
sns.set_style("white")

reviews = pd.read_csv('review_data.csv', usecols = ["App", "Translated_Review", "Sentiment"])

display(reviews.head())
display(reviews.shape)

Data Validation

# Check for missing values
display(reviews.isnull().sum())

# Percentage of null in df
percentage_null = round(sum(reviews.isnull().sum()) / (reviews.shape[0] * reviews.shape[1]), 2)

print("\nNull percentage in the entire dataframe: ", percentage_null)

# Remove null values
reviews.dropna(inplace=True)

Exploratory Data Analysis

# Count values in Sentiment column
display(reviews.Sentiment.value_counts())

# Plot
ax = reviews.Sentiment.value_counts().plot(kind="bar", color="cadetblue")
ax.set_xlabel("Sentiment")
ax.set_xticklabels(reviews.Sentiment.value_counts().index, rotation = 360)
ax.set_ylabel("Count")
ax.set_title("Sentiment Frequency");

Since neutral sentiment is neither positive nor negative, we can sum negative and neutral into a single "not positive" class and deal with a binary target variable (1 positive, 0 not positive).

# Replace Sentiment column with dummy variables
reviews["Sentiment"] = [1 if x == "Positive" else 0 for x in reviews.Sentiment]

# Count values in adjusted Sentiment column
display(reviews.Sentiment.value_counts())

# Plot
reviews.Sentiment.value_counts().plot(kind="bar", color="cadetblue")
plt.xlabel("Sentiment")
plt.ylabel("Count")
plt.title("Sentiment Frequency Adjusted");

The Sentiment variable is binary, so the positive reviews rate for each app is equal to the Sentiment mean for that app. Let's see what the top 10 rated apps are.

top_rated_app = reviews.groupby("App").Sentiment.agg(["count", "mean"]) \
                        .sort_values(["mean", "count"], ascending=False).reset_index().head(10)

fig, ax = plt.subplots()
sns.barplot(x="count", y="App", data=top_rated_app, color="cadetblue", ax=ax)
ax2 = ax.twiny()
ax2.plot(top_rated_app["mean"], top_rated_app.App, color="red", linestyle="--")
plt.xlim([0,1.05])
ax.set_xlabel("Reviews", color="cadetblue")
ax2.set_xlabel("Positive sentiment rate", color="red")
ax.tick_params("x", colors="cadetblue")
ax2.tick_params("x", colors="red")
fig.suptitle("Top 10 rated apps", y= 1);

‌
‌
‌

Predicting Google Apps Sentiment Score with Naïve Beyes

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Predicting Google Apps Sentiment with Naïve Beyes

Data Dictionary

Data Validation

Exploratory Data Analysis

Predicting Google Apps Sentiment with Naïve Beyes