Competition - Bee friendly plants
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Which plants are better for bees: native or non-native?

    📖 Background

    You work for the local government environment agency and have taken on a project about creating pollinator bee-friendly spaces. You can use both native and non-native plants to create these spaces and therefore need to ensure that you use the correct plants to optimize the environment for these bees.

    The team has collected data on native and non-native plants and their effects on pollinator bees. Your task will be to analyze this data and provide recommendations on which plants create an optimized environment for pollinator bees.

    💾 The Data

    You have assembled information on the plants and bees research in a file called plants_and_bees.csv. Each row represents a sample that was taken from a patch of land where the plant species were being studied.

    ColumnDescription
    sample_idThe ID number of the sample taken.
    bees_numThe total number of bee individuals in the sample.
    dateDate the sample was taken.
    seasonSeason during sample collection ("early.season" or "late.season").
    siteName of collection site.
    native_or_nonWhether the sample was from a native or non-native plot.
    samplingThe sampling method.
    plant_speciesThe name of the plant species the sample was taken from. None indicates the sample was taken from the air.
    timeThe time the sample was taken.
    bee_speciesThe bee species in the sample.
    sexThe gender of the bee species.
    specialized_onThe plant genus the bee species preferred.
    parasiticWhether or not the bee is parasitic (0:no, 1:yes).
    nestingThe bees nesting method.
    statusThe status of the bee species.
    nonnative_beeWhether the bee species is native or not (0:no, 1:yes).

    Source (data has been modified)

    💪 Challenge

    Provide your agency with a report that covers the following:

    • Which plants are preferred by native vs non-native bee species?
    • A visualization of the distribution of bee and plant species across one of the samples.
    • Select the top three plant species you would recommend to the agency to support native bees.

    🧑‍⚖️ Judging criteria

    This is a community-based competition. The top 5 most upvoted entries will win.

    The winners will receive DataCamp merchandise.

    ✅ Checklist before publishing

    • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
    • Remove redundant cells like the judging criteria, so the workbook is focused on your work.
    • Check that all the cells run without error.

    ⌛️ Time is ticking. Good luck!

    # Importing libraries
    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib.style as style
    import seaborn as sns
    import numpy as np
    from datetime import datetime
    
    data = pd.read_csv("data/plants_and_bees.csv")
    
    df_bees_clean = data.copy()
    
    df_bees_clean
    # a. State whether the values match the description given in the table above.
    
    # b. State the number of missing values in the column.
    
    # Check the data types in the columns
    # df_fitness_clean.info()
    df_bees_clean.dtypes
    # Check the missing values in the columns
    #df_bees_clean.isna().sum()
    # Clean Rating and Review columns
    
    df_bees_clean.sort_values(by = 'bees_num', inplace=True)
    
    # date Date the sample was taken. time time the sample was taken.
    # Convert 'time' column to a zero-padded string representation
    df_bees_clean['time'] = df_bees_clean['time'].astype(str).str.zfill(4)
    # Convert 'date' column to a proper datetime format
    df_bees_clean['date'] = pd.to_datetime(df_bees_clean['date'], format='%m/%d/%Y')
    # Concatenate 'date' and 'time' columns into a new column 'datetime'
    df_bees_clean['datetime'] = df_bees_clean['date'].dt.strftime('%Y-%m-%d') + ' ' + df_bees_clean['time']
    # Convert 'datetime' column to datetime objects
    df_bees_clean['datetime'] = pd.to_datetime(df_bees_clean['datetime'])
    # Now you can drop the 'time' and 'date' columns if you don't need them anymore
    df_bees_clean.drop(columns=['time', 'date'], inplace=True)
    # Use dt.floor() to remove the seconds from the 'datetime' column
    df_bees_clean['datetime'] = df_bees_clean['datetime'].dt.floor('min')
    
    # specialized_on	The plant genus the bee species preferred.
    df_bees_clean['specialized_on'] = df_bees_clean['specialized_on'].fillna(0)
    df_bees_clean['specialized_on'] = df_bees_clean['specialized_on'].replace(0, 'unknown')
    
    # parasitic	Whether or not the bee is parasitic (0:no, 1:yes).
    df_bees_clean['parasitic'].fillna(0, inplace=True)
    df_bees_clean['parasitic'] = df_bees_clean['parasitic'].replace({0: 'No', 1: 'Yes'})
    
    # nesting	The bees nesting method.
    df_bees_clean['nesting'].fillna(0, inplace=True)
    df_bees_clean['nesting'] = df_bees_clean['nesting'].replace(0, 'unknown')
    
    # status The status of the bee species.
    df_bees_clean['status'].fillna(0, inplace=True)
    df_bees_clean['status'] = df_bees_clean['status'].replace(0, 'unknown')
    
    # nonnative_bee	Whether the bee species is native or not (0:no, 1:yes).
    df_bees_clean['nonnative_bee'].fillna(0, inplace=True)
    df_bees_clean['nonnative_bee'] = df_bees_clean['nonnative_bee'].replace({0: 'No', 1: 'Yes'})
    # Order Categories
    ordered_cats = {"season":['early.season', 'late.season'], 
                    "site": ['A', 'B', 'C'], 
                    "native_or_non": ['native', 'non-native'],
                    "sampling": ['pan traps', 'hand netting'],
                    
                    "plant_species": ['None', 'Trifolium incarnatum', 'Viola cornuta',
           'Trifolium repens', 'Leucanthemum vulgare',
           'Melilotus officinalis', 'Tradescantia virginiana',
           'Penstemon digitalis', 'Trifolium pratense', 'Monarda punctata',
           'Asclepias tuberosa', 'Rudbeckia hirta', 'Coronilla varia',
           'Lobularia maritima', 'Daucus carota', 'Chamaecrista fasciculata',
           'Pycnanthemum tenuifolium', 'Agastache foeniculum',
           'Cosmos bipinnatus', 'Helenium flexuosum', 'Origanum vulgare',
           'Lotus corniculatus', 'Cichorium intybus', 'Rudbeckia triloba'],
                    
                    "bee_species": ['Augochlorella aurata', 'Agapostemon texanus', 'Andrena carlini',
           'Andrena perplexa', 'Apis mellifera', 'Lasioglossum tegulare',
           'Lasioglossum pectorale', 'Lasioglossum pilosum',
           'Lasioglossum cressonii', 'Lasioglossum trigeminum',
           'Osmia pumila', 'Andrena miserabilis', 'Lasioglossum versatum',
           'Halictus poeyi/ligatus', 'Osmia atriventris',
           'Nomada bidentate_group', 'Osmia bucephala',
           'Lasioglossum callidum', 'Ceratina calcarata',
           'Agapostemon splendens', 'Lasioglossum coreopsis',
           'Nomada australis', 'Ceratina', 'Megachile brevis',
           'Halictus parallelus', 'Ceratina strenua',
           'Andrena (Trachandrena)', 'Andrena nasonii', 'Ceratina mikmaqi',
           'Agapostemon virescens', 'Osmia subfasciata',
           'Lasioglossum coriaceum', 'Lasioglossum vierecki',
           'Nomada pygmaea', 'Nomada articulata', 'Osmia taurus',
           'Andrena banksi', 'Osmia distincta', 'Eucera hamata',
           'Hoplitis producta', 'Augochloropsis metallica_metallica',
           'Halictus confusus', 'Ceratina dupla', 'Andrena barbara',
           'Osmia georgica', 'Lasioglossum oblongum',
           'Lasioglossum floridanum', 'Nomada parva', 'Osmia sandhouseae',
           'Lasioglossum bruneri', 'Megachile mendica', 'Lasioglossum weemsi',
           'Hoplitis pilosifrons', 'Bombus bimaculatus', 'Lasioglossum',
           'Lasioglossum subviridatum', 'Bombus impatiens',
           'Bombus griseocollis', 'Lasioglossum hitchensi',
           'Agapostemon sericeus', 'Andrena wilkella', 'Andrena macra',
           'Hoplitis truncata', 'Augochloropsis metallica_fulgida',
           'Andrena atlantica', 'Calliopsis andreniformis',
           'Melissodes subillatus', 'Anthidiellum notatum',
           'Megachile exilis', 'Heriades carinata', 'Lasioglossum ephialtum',
           'Megachile georgica', 'Lasioglossum admirandum',
           'Lasioglossum gotham', 'Lasioglossum abanci', 'Megachile texana',
           'Triepeolus lunatus', 'Melissodes', 'Melissodes bimaculatus',
           'Melissodes comptoides', 'Melissodes trinodis',
           'Bombus fervidus/pensylvanicus', 'Nomada texana',
           'Augochlora pura', 'Bombus citrinus', 'Hylaeus affinis/modestus',
           'Hylaeus modestus', 'Melitoma taurea', 'Triepeolus remigatus',
           'Anthidium manicatum', 'Bombus pensylvanicus', 'Bombus fervidus',
           'Nomada vegana'],
                   "sex": ['f', 'm'],
                   "specialized_on": ['Penstemon', 'Ipomoea','unknown'],
                    "parasitic": ['No', 'Yes'],
                   "nesting": ['ground', 'hive', 'wood', 'parasite [ground]', 'wood/shell','wood/cavities', 'unknown'],
                   "status": ['uncommon', 'vulnerable (IUCN)', 'common', 'unknown'],
                   "nonnative_bee": ['No', 'Yes']}
    
    
    # Loop through DataFrame columns to efficiently change data types
    for col in df_bees_clean:
        
        # Convert integer columns to int32
        if df_bees_clean[col].dtype == 'int':
            df_bees_clean[col] = df_bees_clean[col].astype('int16')
        
        # Convert float columns to float16
        elif df_bees_clean[col].dtype == 'float':
            df_bees_clean[col] = df_bees_clean[col].astype('float16')
            
        elif df_bees_clean[col].dtype == 'datetime':
            df_bees_clean[col] = df_bees_clean[col].astype('datetime')
        
        # Convert columns containing ordered categorical data to ordered categories using dict
        elif col in ordered_cats.keys():
            category = pd.CategoricalDtype(ordered_cats[col], ordered=True)
            df_bees_clean[col] = df_bees_clean[col].astype(category)
            
        # Convert remaining columns to standard categories
        #else:
            #df_bees_clean[col] = df_bees_clean[col].astype('category')
            
    df_bees_clean.dtypes
    
    df_bees_clean

    Which plants are preferred by native vs non-native bee species?

    sns.color_palette('colorblind')
    # Filter the DataFrame to include only rows with 'native' or 'non-native' bee species
    selected_bees = df_bees_clean[df_bees_clean['native_or_non'].isin(['native', 'non-native'])]
    
    # Group the DataFrame by 'plant_species' and 'native_or_non' and count the occurrences
    plant_counts_by_bee = selected_bees.groupby(['plant_species', 'native_or_non']).size().unstack(fill_value=0)
    
    # Sum the counts for each plant species across 'native' and 'non-native' categories
    total_counts = plant_counts_by_bee.sum(axis=1)
    
    # Sort the total counts to get the plants with the highest preference
    top_plants = total_counts.nlargest(20)
    
    # Calculate the difference in counts for each plant species between 'native' and 'non-native' bee species
    difference_counts = plant_counts_by_bee['native'] - plant_counts_by_bee['non-native']
    
    # Get the absolute values of the differences for the top plants
    top_plants_difference_abs = difference_counts[top_plants.index].abs()
    
    # Define colors for native and non-native bee species
    native_color = 'skyblue'
    non_native_color = 'orange'
    
    # Create a bar plot to show the difference in preference for the top plants
    plt.figure(figsize=(10, 6))
    bars = plt.bar(top_plants_difference_abs.index, top_plants_difference_abs.values, color=[native_color if d > 0 else non_native_color for d in difference_counts[top_plants.index]])
    plt.xlabel('Plant Species')
    plt.ylabel('Difference in Preference (Absolute)')
    plt.title('Difference in Preference for Top Plants')
    plt.xticks(rotation=45, ha='right')
    
    # Add labels above the bars
    for bar in bars:
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width() / 2, height, f'{int(height):d}', ha='center', va='bottom')
    
    # Add custom legend
    legend_native = plt.Line2D([], [], color=native_color, marker='s', markersize=10, label='Native Preference')
    legend_non_native = plt.Line2D([], [], color=non_native_color, marker='s', markersize=10, label='Non-Native Preference')
    plt.legend(handles=[legend_native, legend_non_native], loc='upper right')
    
    # Show the plot
    plt.show()

    A visualization of the distribution of bee and plant species across one of the samples.