this is the nav!
Workspace
Mamkon Mercy Oyeleke/

# Live Training: Green Businesses (Solutions) (copy)

0
Beta

## .mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}2023-09-08 | Data Storytelling for Absolute Beginners: A Case Study with Green Businesses | Camilo Martinez

### Let's look into Datacamp's Workspaces.

• They are like Jupyter notebooks... but better
• You can load datasets and interact with them with python code directly or using AI.

Let's look into a simple Exploratory Data Analysis exercise

### Exploratory Data Analysis

Exploratory Data Analysis (EDA) is an important step in any data analysis project. It helps us understand the data, identify patterns, and uncover insights. In this exercise, we will perform EDA on a dataset of green businesses.

#### Dataset

The dataset contains information about various green businesses, including their location, type of business, and environmental impact. We will explore the dataset to gain insights into the distribution of businesses, the most common types of businesses, and the relationship between business type and environmental impact.

Let's get started!

### Exercises

Let's load in a Dataset with the help of AI:

Now let's count the number of green businesses:

```.mfe-app-workspace-jfrv3u{font-size:13px;line-height:20px;font-family:JetBrainsMonoNL,Menlo,Monaco,'Courier New',monospace;}```# Read the green-businesses dataset using python
import pandas as pd

# Assuming the dataset is in a CSV file named 'green-businesses.csv'

Now we can try something more complex. It helps to know some of the concepts we reviewed in the presentation.

"Please create a line plot that shows the evolution of the number of entries grouped by year"

``````# Import the necessary libraries
import matplotlib.pyplot as plt

# Group the data by year and count the number of entries
df_grouped = df.groupby('year').size()

# Create a line plot
plt.plot(df_grouped.index, df_grouped.values)
plt.xlabel('Year')
plt.ylabel('Number of Entries')
plt.title('Evolution of Number of Entries by Year')
plt.show()
``````

Another example. It is very useful to think about what meaningful groups we can create using the existing variables. In this case, the region variable.

"Please create a horizontal bar plot of the number of entries in the dataset grouped by the region variable"

To create a horizontal bar plot of the number of entries in the dataset grouped by the region variable, you can use the following code:

```.mfe-app-workspace-16mnz0b{right:4px;top:4px;}.mfe-app-workspace-5n8jq8{-webkit-backdrop-filter:blur(6px);backdrop-filter:blur(6px);color:var(--wf-text--subtle, #5D6A77);float:right;right:4px;top:4px;}.mfe-app-workspace-1ctqx14{-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;border-color:transparent;border-radius:4px;border-style:solid;border-width:2px;cursor:pointer;display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;font-family:Studio-Feixen-Sans,Arial,sans-serif;font-weight:800;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;line-height:1;margin:0;outline:0;padding:0;position:relative;-webkit-text-decoration:none;text-decoration:none;-webkit-transition:background-color 125ms ease-out;transition:background-color 125ms ease-out;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none;vertical-align:middle;background-color:transparent;color:var(--wf-text--link, #0065D1);font-size:12px;height:28px;min-width:28px;width:auto;padding-left:8px;padding-right:8px;-webkit-backdrop-filter:blur(6px);backdrop-filter:blur(6px);color:var(--wf-text--subtle, #5D6A77);float:right;right:4px;top:4px;}.mfe-app-workspace-1ctqx14::before{border-radius:2px;content:"";display:block;height:100%;inset:0;position:absolute;width:100%;z-index:0;}.mfe-app-workspace-1ctqx14:active{background-color:transparent;}.mfe-app-workspace-1ctqx14:disabled{cursor:default;opacity:0.6;pointer-events:none;}.mfe-app-workspace-1ctqx14:hover{border-color:var(--wf-bg--hover, rgba(48, 57, 105, 0.06));}.mfe-app-workspace-1ctqx14:hover::before{background-color:var(--wf-bg--hover, rgba(48, 57, 105, 0.06));}.mfe-app-workspace-1ctqx14 >*{z-index:1;}.mfe-app-workspace-r1uej3{font-size:13px;line-height:20px;padding:8px;}```# Group the data by region and count the number of entries
df_grouped = df.groupby('region').size()

# Create a horizontal bar plot
plt.barh(df_grouped.index, df_grouped.values)
plt.xlabel('Number of Entries')
plt.ylabel('Region')
plt.title('Number of Entries by Region')
plt.show()
``````

Make sure to run the previous cells to import the necessary libraries and read the dataset.

### Exploratory Data Analysis

Exploratory Data Analysis (EDA) is an important step in any data analysis project. It helps us understand the data, identify patterns, and uncover insights. In this exercise, we will perform some basic EDA on a dataset.

#### Step 1: Load the Data

First, let's load the dataset into a pandas DataFrame. We can use the `read_csv()` function from the `pandas` library to read a CSV file.

``````import pandas as pd

``````

Replace `'data.csv'` with the path to your dataset.

#### Step 2: Explore the Data

Once we have loaded the data, we can start exploring it. Here are some common tasks:

• View the first few rows of the dataset using the `head()` function.
• Check the dimensions of the dataset using the `shape` attribute.
• Check the data types of the columns using the `dtypes` attribute.
• Check for missing values using the `isnull()` function.
• Calculate summary statistics using the `describe()` function.
``````# View the first few rows

# Check the dimensions
print(df.shape)

# Check the data types
print(df.dtypes)

# Check for missing values
print(df.isnull().sum())

# Calculate summary statistics
print(df.describe())
``````

Feel free to modify the code to suit your dataset.

#### Step 3: Visualize the Data

Visualization is a powerful tool for understanding data. Here are some common plots:

• Histograms: to visualize the distribution of a numerical variable.
• Bar plots: to compare the values of a categorical variable.
• Scatter plots: to explore the relationship between two numerical variables.

You can use libraries like `matplotlib` or `seaborn` to create these plots. Here's an example:

``````import matplotlib.pyplot as plt

# Create a histogram of a numerical variable
plt.hist(df['age'])
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Distribution of Age')
plt.show()
``````

Feel free to explore different types of plots and customize them as needed.

To create a horizontal bar plot of the number of entries in the dataset grouped by the region variable, you can use the following code:

``````import matplotlib.pyplot as plt

# Group the data by region and count the number of entries
df_grouped = df.groupby('region').size()

# Create a horizontal bar plot
plt.barh(df_grouped.index, df_grouped.values)
plt.xlabel('Number of Entries')
plt.ylabel('Region')
plt.title('Number of Entries by Region')
plt.show()
``````

Make sure to run this code after running the previous cells to ensure that the `df` variable is defined.

Finally, don't hesitate to try more natural looking questions. This is where the power of language models shines:

"Please tell me the most common type of green business according to the result variable."

``````import pandas as pd