Buzzing Discoveries : Exploring the Plant Preference of Bees

Beta

Buzzing Discoveries: Exploring the Plant Preference of Bees

This notebook was constructed as part of a DataCamp Community Competition. The chosen tool for the analysis is Python, visualisations were compiled using Plotly, providing interactive graphs (including hover-over info and zoom function for more details if desired). Code is hidden for increased readablity in the publication, but can be accessed in the notebook. As a beginning beekeeper myself, I was very excited to be working with this dataset!

📖 Background

We have taken on a project about creating pollinator bee-friendly spaces for a local government environment agency. Bee-friendly spaces can be created using both native and non-native plants and therefore we need to ensure that the correct plants are used to optimize the environment for these bees.

The team has collected data on native and non-native plants and their effects on pollinator bees. Our task is to analyze this data and provide insights and recommendations on which plants create an optimized environment for pollinator bees.

🔎 Analysis Objectives

Exploratory Analysis

We will perform an exploratory analysis of the available variables, providing some general insights and context for the study and the research answers listed below.

Research questions

We will provide the enviroment agency with answers to the following questions :

Which plants are preferred by native vs non-native bee species?
Select the top three plant species you would recommend to the agency to support native bees/non-native bees?
A visualization of the distribution of bee and plant species across one of the samples.

💾 Data Validation

Original Data

The description of the data is shown below, accompanied by the original Dataframe.

You have assembled information on the plants and bees research in a file called plants_and_bees.csv. Each row represents a sample that was taken from a patch of land where the plant species were being studied.

Column	Description
`sample_id`	The ID number of the sample taken.
`species_num`	The number of different bee species in the sample.
`date`	Date the sample was taken (format : MM-DD-YY)
`season`	Season during sample collection ("early.season" or "late.season").
`site`	Name of collection site.
`native_or_non`	Whether the sample was from a native or non-native plant.
`sampling`	The sampling method.
`plant_species`	The name of the plant species the sample was taken from. None indicates the sample was taken from the air.
`time`	The time the sample was taken.
`bee_species`	The bee species in the sample.
`sex`	The gender of the bee species.
`specialized_on`	The plant genus the bee species preferred.
`parasitic`	Whether or not the bee is parasitic (0:no, 1:yes).
`nesting`	The bees nesting method.
`status`	The status of the bee species.
`nonnative_bee`	Whether the bee species is native or not (0:no, 1:yes).

Data is courtesy of Dryad - Source (data has been modified)

Data Exploration

We will import the raw data and appropriate packages for analysis.

Hidden code

raw_data.select_dtypes(include=['int']).describe()

Hidden output

raw_data.select_dtypes(include='float').describe()

Hidden output

raw_data.select_dtypes(include='object').describe()

Hidden output

Findings & Actions needed

Following summaries for each column discuss :

Whether the values match the description given in the provided table,
If there were missing values and which value they were represented by,
If there were other inconsistensies which need need fixing,
Which actions are needed to make the values match the description provided.

sample_id

Duplicates were present, but will not be dropped : it is possible for samples to contain multiple identical observations (entries are not uniquely identified by the sample_id).

species_num

This column does not have any missing values and does not need alterations.

date

The column can be used in its current form and will not be altered.

season

This column does not have any missing values,
The original notation style will be altered, removing '.season' from the entries.

site

This column does not have any missing values and does not need any alterations.

native_or_non

The name of the column is changed to 'native_plant' for clarity during analysis,
There are no missing values,
Original notation for the findings - values (0:no, 1:yes) - is changed to Yes and No.

sampling

This column does not have any missing values and does not need alterations.

plant_species

This column does not have any missing values and does not need alterations.

time

This column has the wrong datatype and notation, both will be adjusted.

bee_species

This column does not have any missing values and does not need alterations.

sex

This column does not have any missing values and not does need alterations.

specialized_on

This column contains 1243 missing values, meaning about 99.44% of the values are missing.
This variable doesn't play a crucial role in answering the research question provided, and will not be taken into account during the analysis.

parasitic

This column contains 63 missing values, which will be replaced by 'Not Specified' for clarity during analysis.

nesting

This column contains 54 missing values, which will be replaced by 'Not Specified' for clarity during analysis.

status

This column contains 1235 missing values, meaning about 98.8% of the values are missing.
This variable doesn't play a crucial role in answering the research question provided, and will not be taken into account during the analysis.

nonnative_bee

The name of the column is changed to 'native_bee' for clarity during analysis,
There are 61 missing values, these are replaced by 'Not Specified',
Original notation for the findings - values (0:no, 1:yes) - is changed to Yes and No.

Result of Data Cleaning

All actions described above will be executed to ensure analysis-ready data. In addition, several variables will be converted to categorical, as they have limited description options and this will save some computational space.

#selecting data to be analyzed - discarding columns 'specialized_on' and 'status'
data = raw_data.drop(['specialized_on', 'status' ], axis=1)

#rename columns 'native_or_non' and 'nonnative_bee'
data = data.rename(columns={'native_or_non':'native_plant','nonnative_bee':'native_bee'})

#replace nulls with description where advised
columns_to_fill = ['parasitic', 'nesting', 'native_bee']

for column in columns_to_fill:
    data[column].fillna("Not Specified", inplace=True)

#values for 'parasitic' : replace "0/1" with "No/Yes"
data['parasitic'] = data['parasitic'].replace(0, "No")
data['parasitic'] = data['parasitic'].replace(1, "Yes")

#values for 'nonnative_bee' : replace "0/1" with "No/Yes"
data['native_bee'] = data['native_bee'].replace(0, "non-native")
data['native_bee'] = data['native_bee'].replace(1, "native")

#change season notation for season - remove .season by splitting at the dot and keeping only the first part
data['season'] = data['season'].apply(lambda notation: notation.split('.')[0])

#convert time
data['time'] = data['time'].apply(lambda x: datetime.strptime(str(x), "%H%M").strftime("%H:%M"))

Hidden code

‌
‌
‌

Buzzing Discoveries : Exploring the Plant Preference of Bees

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Buzzing Discoveries: Exploring the Plant Preference of Bees

📖 Background

🔎 Analysis Objectives

💾 Data Validation

Original Data

Data Exploration

Findings & Actions needed

Result of Data Cleaning

Buzzing Discoveries: Exploring the Plant Preference of Bees