Skip to content
Raise awareness about climate change with ggplot2
  • AI Chat
  • Code
  • Report
  • Spinner

    Raise awareness about climate change with ggplot2

    Learn where to find and how to show historical weather data with customized ggplot2 graphs

    Introduction

    Global warming isn’t a prediction. It is happening.
    James Hansen

    There is solid evidence that temperatures are rising on our planet. To thrive in today's competitive job market, whether your career goals involve the public sector, non-governmental organizations, or private businesses, it is essential that you demonstrate your commitment to addressing pressing global problems. Since climate change is threatening the very existence of humankind, you should be able to understand, research, and create awareness about it among your colleagues.

    it's vital to demonstrate your commitment to addressing pressing global concerns.

    In this tutorial, you will learn where to find reliable and curated historical temperature data and visualize it with ggplot2. After you finish this tutorial, you will:

    • know where to find curated datasets with historical weather data;
    • feel comfortable plotting historical weather data with ggplot2;
    • be able to customize your ggplot2 graphs to better tell your story.

    Step 1: Finding and loading the data

    Data for this tutorial is available on National Centers for Environmental Information (NCEI). The NCEI is the leading authority for environmental data in the USA and provides high quality data about climate, ecosystems and water resources. The Global Summary of the Year (GSOY) dataset offers historical weather data by city and station. For this tutorial, we will use data from Berkeley, CA. You can choose your preferred city if you wish.

    Data will be loaded with read_csv. The first argument is the file path, while the second, col_select, tells R which columns you would like to load. Note that this dataset contains several variables, but we are only interested in the "DATE" and "TAVG". "DATE" contains the year the temperature was observed and "TAVG" is the average annual temperature given in Celcius. To know more about the available variables, please consult the dataset codebook.

    library(readr)
    
    df <- read_csv('USC00040693.csv',
                col_select = c("DATE", "TAVG"))
    

    Below, you will print your dataset and its summary statistics to have an initial idea of your data. The R summary() function tells us that the data ranges from 1893 to 2019 and that the minimal average annual temperature observed was 12.9 ºC in Berkeley, CA, in this period. The maximum average temperature was 15.93 ºC.

    df
    summary(df)

    Step 2: Treating missing values

    The summary() function revealed that there are 33 missing temperatures. You can also verify NAs of a specific variable using the function is.na() which returns TRUE if the observation is an NA. Then you can sum all missing values. The sum() function converts TRUE into 1.

    print(paste("There are", sum(is.na(df$TAVG)), "missing values in this dataset."))

    Given that we are working with a time series, we will fill in missing values with linear interpolation. This method assumes data varied linearly during the missing period. Actually, when you plot a time series using a line plot, the intervals between observations, even when no data is missing, are also filled in with a straight line connecting the two dots.

    To perform linear interpolation, we will use the imputeTS package. After installing and loading the library, you can use na_interpolation() to fill in the missing values. You pass two arguments to it. First, the dataframe column you would like to treat, and, second, the method you wish to use to perform the imputation.

    install.packages("imputeTS", quiet = TRUE)