Which plants are better for bees: native or non-native?
📖 Background
You work for the local government environment agency and have taken on a project about creating pollinator bee-friendly spaces. You can use both native and non-native plants to create these spaces and therefore need to ensure that you use the correct plants to optimize the environment for these bees.
The team has collected data on native and non-native plants and their effects on pollinator bees. Your task will be to analyze this data and provide recommendations on which plants create an optimized environment for pollinator bees.
💾 The Data
You have assembled information on the plants and bees research in a file called plants_and_bees.csv
. Each row represents a sample that was taken from a patch of land where the plant species were being studied.
Column | Description |
---|---|
sample_id | The ID number of the sample taken. |
bees_num | The total number of bee individuals in the sample. |
date | Date the sample was taken. |
season | Season during sample collection ("early.season" or "late.season"). |
site | Name of collection site. |
native_or_non | Whether the sample was from a native or non-native plot. |
sampling | The sampling method. |
plant_species | The name of the plant species the sample was taken from. None indicates the sample was taken from the air. |
time | The time the sample was taken. |
bee_species | The bee species in the sample. |
sex | The gender of the bee species. |
specialized_on | The plant genus the bee species preferred. |
parasitic | Whether or not the bee is parasitic (0:no, 1:yes). |
nesting | The bees nesting method. |
status | The status of the bee species. |
nonnative_bee | Whether the bee species is native or not (0:no, 1:yes). |
Source (data has been modified)
💪 Challenge
Provide your agency with a report that covers the following:
- Which plants are preferred by native vs non-native bee species?
- A visualization of the distribution of bee and plant species across one of the samples.
- Select the top three plant species you would recommend to the agency to support native bees.
🧑⚖️ Judging criteria
This is a community-based competition. The top 5 most upvoted entries will win.
The winners will receive DataCamp merchandise.
✅ Checklist before publishing
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the judging criteria, so the workbook is focused on your work.
- Check that all the cells run without error.
⌛️ Time is ticking. Good luck!
# Importing libraries
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.style as style
import seaborn as sns
import numpy as np
from datetime import datetime
data = pd.read_csv("data/plants_and_bees.csv")
df_bees_clean = data.copy()
df_bees_clean
# a. State whether the values match the description given in the table above.
# b. State the number of missing values in the column.
# Check the data types in the columns
# df_fitness_clean.info()
df_bees_clean.dtypes
# Check the missing values in the columns
#df_bees_clean.isna().sum()
# Clean Rating and Review columns
df_bees_clean.sort_values(by = 'bees_num', inplace=True)
# date Date the sample was taken. time time the sample was taken.
# Convert 'time' column to a zero-padded string representation
df_bees_clean['time'] = df_bees_clean['time'].astype(str).str.zfill(4)
# Convert 'date' column to a proper datetime format
df_bees_clean['date'] = pd.to_datetime(df_bees_clean['date'], format='%m/%d/%Y')
# Concatenate 'date' and 'time' columns into a new column 'datetime'
df_bees_clean['datetime'] = df_bees_clean['date'].dt.strftime('%Y-%m-%d') + ' ' + df_bees_clean['time']
# Convert 'datetime' column to datetime objects
df_bees_clean['datetime'] = pd.to_datetime(df_bees_clean['datetime'])
# Now you can drop the 'time' and 'date' columns if you don't need them anymore
df_bees_clean.drop(columns=['time', 'date'], inplace=True)
# Use dt.floor() to remove the seconds from the 'datetime' column
df_bees_clean['datetime'] = df_bees_clean['datetime'].dt.floor('min')
# specialized_on The plant genus the bee species preferred.
df_bees_clean['specialized_on'] = df_bees_clean['specialized_on'].fillna(0)
df_bees_clean['specialized_on'] = df_bees_clean['specialized_on'].replace(0, 'unknown')
# parasitic Whether or not the bee is parasitic (0:no, 1:yes).
df_bees_clean['parasitic'].fillna(0, inplace=True)
df_bees_clean['parasitic'] = df_bees_clean['parasitic'].replace({0: 'No', 1: 'Yes'})
# nesting The bees nesting method.
df_bees_clean['nesting'].fillna(0, inplace=True)
df_bees_clean['nesting'] = df_bees_clean['nesting'].replace(0, 'unknown')
# status The status of the bee species.
df_bees_clean['status'].fillna(0, inplace=True)
df_bees_clean['status'] = df_bees_clean['status'].replace(0, 'unknown')
# nonnative_bee Whether the bee species is native or not (0:no, 1:yes).
df_bees_clean['nonnative_bee'].fillna(0, inplace=True)
df_bees_clean['nonnative_bee'] = df_bees_clean['nonnative_bee'].replace({0: 'No', 1: 'Yes'})
# Order Categories
ordered_cats = {"season":['early.season', 'late.season'],
"site": ['A', 'B', 'C'],
"native_or_non": ['native', 'non-native'],
"sampling": ['pan traps', 'hand netting'],
"plant_species": ['None', 'Trifolium incarnatum', 'Viola cornuta',
'Trifolium repens', 'Leucanthemum vulgare',
'Melilotus officinalis', 'Tradescantia virginiana',
'Penstemon digitalis', 'Trifolium pratense', 'Monarda punctata',
'Asclepias tuberosa', 'Rudbeckia hirta', 'Coronilla varia',
'Lobularia maritima', 'Daucus carota', 'Chamaecrista fasciculata',
'Pycnanthemum tenuifolium', 'Agastache foeniculum',
'Cosmos bipinnatus', 'Helenium flexuosum', 'Origanum vulgare',
'Lotus corniculatus', 'Cichorium intybus', 'Rudbeckia triloba'],
"bee_species": ['Augochlorella aurata', 'Agapostemon texanus', 'Andrena carlini',
'Andrena perplexa', 'Apis mellifera', 'Lasioglossum tegulare',
'Lasioglossum pectorale', 'Lasioglossum pilosum',
'Lasioglossum cressonii', 'Lasioglossum trigeminum',
'Osmia pumila', 'Andrena miserabilis', 'Lasioglossum versatum',
'Halictus poeyi/ligatus', 'Osmia atriventris',
'Nomada bidentate_group', 'Osmia bucephala',
'Lasioglossum callidum', 'Ceratina calcarata',
'Agapostemon splendens', 'Lasioglossum coreopsis',
'Nomada australis', 'Ceratina', 'Megachile brevis',
'Halictus parallelus', 'Ceratina strenua',
'Andrena (Trachandrena)', 'Andrena nasonii', 'Ceratina mikmaqi',
'Agapostemon virescens', 'Osmia subfasciata',
'Lasioglossum coriaceum', 'Lasioglossum vierecki',
'Nomada pygmaea', 'Nomada articulata', 'Osmia taurus',
'Andrena banksi', 'Osmia distincta', 'Eucera hamata',
'Hoplitis producta', 'Augochloropsis metallica_metallica',
'Halictus confusus', 'Ceratina dupla', 'Andrena barbara',
'Osmia georgica', 'Lasioglossum oblongum',
'Lasioglossum floridanum', 'Nomada parva', 'Osmia sandhouseae',
'Lasioglossum bruneri', 'Megachile mendica', 'Lasioglossum weemsi',
'Hoplitis pilosifrons', 'Bombus bimaculatus', 'Lasioglossum',
'Lasioglossum subviridatum', 'Bombus impatiens',
'Bombus griseocollis', 'Lasioglossum hitchensi',
'Agapostemon sericeus', 'Andrena wilkella', 'Andrena macra',
'Hoplitis truncata', 'Augochloropsis metallica_fulgida',
'Andrena atlantica', 'Calliopsis andreniformis',
'Melissodes subillatus', 'Anthidiellum notatum',
'Megachile exilis', 'Heriades carinata', 'Lasioglossum ephialtum',
'Megachile georgica', 'Lasioglossum admirandum',
'Lasioglossum gotham', 'Lasioglossum abanci', 'Megachile texana',
'Triepeolus lunatus', 'Melissodes', 'Melissodes bimaculatus',
'Melissodes comptoides', 'Melissodes trinodis',
'Bombus fervidus/pensylvanicus', 'Nomada texana',
'Augochlora pura', 'Bombus citrinus', 'Hylaeus affinis/modestus',
'Hylaeus modestus', 'Melitoma taurea', 'Triepeolus remigatus',
'Anthidium manicatum', 'Bombus pensylvanicus', 'Bombus fervidus',
'Nomada vegana'],
"sex": ['f', 'm'],
"specialized_on": ['Penstemon', 'Ipomoea','unknown'],
"parasitic": ['No', 'Yes'],
"nesting": ['ground', 'hive', 'wood', 'parasite [ground]', 'wood/shell','wood/cavities', 'unknown'],
"status": ['uncommon', 'vulnerable (IUCN)', 'common', 'unknown'],
"nonnative_bee": ['No', 'Yes']}
# Loop through DataFrame columns to efficiently change data types
for col in df_bees_clean:
# Convert integer columns to int32
if df_bees_clean[col].dtype == 'int':
df_bees_clean[col] = df_bees_clean[col].astype('int16')
# Convert float columns to float16
elif df_bees_clean[col].dtype == 'float':
df_bees_clean[col] = df_bees_clean[col].astype('float16')
elif df_bees_clean[col].dtype == 'datetime':
df_bees_clean[col] = df_bees_clean[col].astype('datetime')
# Convert columns containing ordered categorical data to ordered categories using dict
elif col in ordered_cats.keys():
category = pd.CategoricalDtype(ordered_cats[col], ordered=True)
df_bees_clean[col] = df_bees_clean[col].astype(category)
# Convert remaining columns to standard categories
#else:
#df_bees_clean[col] = df_bees_clean[col].astype('category')
df_bees_clean.dtypes
df_bees_clean
Which plants are preferred by native vs non-native bee species?
sns.color_palette('colorblind')
# Filter the DataFrame to include only rows with 'native' or 'non-native' bee species
selected_bees = df_bees_clean[df_bees_clean['native_or_non'].isin(['native', 'non-native'])]
# Group the DataFrame by 'plant_species' and 'native_or_non' and count the occurrences
plant_counts_by_bee = selected_bees.groupby(['plant_species', 'native_or_non']).size().unstack(fill_value=0)
# Sum the counts for each plant species across 'native' and 'non-native' categories
total_counts = plant_counts_by_bee.sum(axis=1)
# Sort the total counts to get the plants with the highest preference
top_plants = total_counts.nlargest(20)
# Calculate the difference in counts for each plant species between 'native' and 'non-native' bee species
difference_counts = plant_counts_by_bee['native'] - plant_counts_by_bee['non-native']
# Get the absolute values of the differences for the top plants
top_plants_difference_abs = difference_counts[top_plants.index].abs()
# Define colors for native and non-native bee species
native_color = 'skyblue'
non_native_color = 'orange'
# Create a bar plot to show the difference in preference for the top plants
plt.figure(figsize=(10, 6))
bars = plt.bar(top_plants_difference_abs.index, top_plants_difference_abs.values, color=[native_color if d > 0 else non_native_color for d in difference_counts[top_plants.index]])
plt.xlabel('Plant Species')
plt.ylabel('Difference in Preference (Absolute)')
plt.title('Difference in Preference for Top Plants')
plt.xticks(rotation=45, ha='right')
# Add labels above the bars
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width() / 2, height, f'{int(height):d}', ha='center', va='bottom')
# Add custom legend
legend_native = plt.Line2D([], [], color=native_color, marker='s', markersize=10, label='Native Preference')
legend_non_native = plt.Line2D([], [], color=non_native_color, marker='s', markersize=10, label='Non-Native Preference')
plt.legend(handles=[legend_native, legend_non_native], loc='upper right')
# Show the plot
plt.show()
A visualization of the distribution of bee and plant species across one of the samples.