Pipes in R Tutorial For Beginners
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Pipes in R Tutorial For Beginners

    Learn more about the famous pipe operator %>% and other pipes in R, why and how you should use them and what alternatives you can consider!

    You might have already seen or used the pipe operator when you're working with packages such as dplyr, magrittr,... But do you know where pipes and the famous %>% operator come from, what they exactly are, or how, when and why you should use them? Can you also come up with some alternatives?

    This tutorial will give you an introduction to pipes in R and will cover the following topics:

    • Pipe Operator in R: Introduction
      • Short History of the Pipe Operator in R
      • What Is It?
      • Why Use It?
      • Additional Pipes
    • How To Use Pipes in R
      • Basic Piping
      • Argument Placeholder
      • Re-using Placeholder for Attributes
      • Building Unary Functions
      • Compound Assignment Pipe Operations
      • Tee Operations with the Tee Operator
      • Exposing Data Variables with the Exposition Operator
    • dplyr and magrittr
    • RStudio Keyboard Shortcuts
    • When Not To Use the Pipe Operator in R
    • Alternatives to Pipes in R
    :target:before { content:""; display:block; height:150px; margin:-150px 0 0; } h3 {font-weight:normal; margin-top:.5em} h4 { font-weight:lighter }

    Are you interested in learning more about manipulating data in R with dplyr? Take a look at DataCamp's Data Manipulation in R with dplyr course.

    Pipe Operator in R: Introduction

    To understand what the pipe operator in R is and what you can do with it, it's necessary to consider the full picture, to learn the history behind it. Questions such as "where does this weird combination of symbols come from and why was it made like this?" might be on top of your mind. You'll discover the answers to these and more questions in this section.

    Now, you can look at the history from three perspectives: from a mathematical point of view, from a holistic point of view of programming languages, and from the point of view of the R language itself. You'll cover all three in what follows!

    History of the Pipe Operator in R

    Mathematical History

    If you have two functions, let's say and , you can chain these functions together by taking the output of one function and inserting it into the next. In short, "chaining" means that you pass an intermediate result onto the next function, but you'll see more about that later.

    For example, you can say, : serves as an input for , while , of course, serves as input to .

    If you would want to note this down, you will use the notation , which reads as "f follows g". Alternatively, you can visually represent this as:

    Image Credit: James Balamuta, ["Piping Data"](http://stat385.thecoatlessprofessor.com/assets/lectures/07-PipingData/lec07-piping-data.pdf)
    Pipe Operators in Other Programming Languages

    As mentioned in the introduction to this section, this operator is not new in programming: in the Shell or Terminal, you can pass command from one to the next with the pipeline character |. Similarly, F# has a forward pipe operator, which will prove to be important later on! Lastly, it's also good to know that Haskell contains many piping operations that are derived from the Shell or Terminal.

    Pipes in R

    Now that you have seen some history of the pipe operator in other programming languages, it's time to focus on R. The history of this operator in R starts, according to this fantastic blog post written by Adolfo Álvarez, on January 17th, 2012, when an anonymous user asked the following question in this Stack Overflow post:

    How can you implement F#'s forward pipe operator in R? The operator makes it possible to easily chain a sequence of calculations. For example, when you have an input data and want to call functions foo and bar in sequence, you can write data |> foo |> bar?

    The answer came from Ben Bolker, professor at McMaster University, who replied:

    I don't know how well it would hold up to any real use, but this seems (?) to do what you want, at least for single-argument functions ...

    "%>%" <- function(x,f) do.call(f,list(x)) pi %>% sin [1] 1.224606e-16 pi %>% sin %>% cos [1] 1 cos(sin(pi)) [1] 1

    About nine months later, Hadley Wickham started the dplyr package on GitHub. You might now know Hadley, Chief Scientist at RStudio, as the author of many popular R packages (such as this last package!) and as the instructor for DataCamp's Writing Functions in R course.

    Be however it may, it wasn't until 2013 that the first pipe %.% appears in this package. As Adolfo Álvarez rightfully mentions in his blog post, the function was denominated chain(), which had the purpose to simplify the notation for the application of several functions to a single data frame in R.

    The %.% pipe would not be around for long, as Stefan Bache proposed an alternative on the 29th of December 2013, that included the operator as you might now know it:

    iris %>% subset(Sepal.Length > 5) %>% aggregate(. ~ Species, ., mean)

    Bache continued to work with this pipe operation and at the end of 2013, the magrittr package came to being. In the meantime, Hadley Wickham continued to work on dplyr and in April 2014, the %.% operator got replaced with the one that you now know, %>%.

    Later that year, Kun Ren published the pipeR package on GitHub, which incorporated a different pipe operator, %>>%, which was designed to add more flexibility to the piping process. However, it's safe to say that the %>% is now established in the R language, especially with the recent popularity of the Tidyverse.

    What Is It?

    Knowing the history is one thing, but that still doesn't give you an idea of what F#'s forward pipe operator is nor what it actually does in R.

    In F#, the pipe-forward operator |> is syntactic sugar for chained method calls. Or, stated more simply, it lets you pass an intermediate result onto the next function.

    Remember that "chaining" means that you invoke multiple method calls. As each method returns an object, you can actually allow the calls to be chained together in a single statement, without needing variables to store the intermediate results.

    In R, the pipe operator is, as you have already seen, %>%. If you're not familiar with F#, you can think of this operator as being similar to the + in a ggplot2 statement. Its function is very similar to that one that you have seen of the F# operator: it takes the output of one statement and makes it the input of the next statement. When describing it, you can think of it as a "THEN".

    Take, for example, following code chunk and read it aloud:

    # Install the necessary packages 
    install.packages("babynames")
    install.packages("hflights")
    library(magrittr)
    
    iris %>%
      subset(Sepal.Length > 5) %>%
      aggregate(. ~ Species, ., mean)

    You're right, the code chunk above will translate to something like "you take the Iris data, then you subset the data and then you aggregate the data".

    This is one of the most powerful things about the Tidyverse. In fact, having a standardized chain of processing actions is called "a pipeline". Making pipelines for a data format is great, because you can apply that pipeline to incoming data that has the same formatting and have it output in a ggplot2 friendly format, for example.

    Why Use It?

    R is a functional language, which means that your code often contains a lot of parenthesis, ( and ). When you have complex code, this often will mean that you will have to nest those parentheses together. This makes your R code hard to read and understand. Here's where %>% comes in to the rescue!

    Take a look at the following example, which is a typical example of nested code:

    # Initialize `x`
    x <- c(0.109, 0.359, 0.63, 0.996, 0.515, 0.142, 0.017, 0.829, 0.907)
    
    # Compute the logarithm of `x`, return suitably lagged and iterated differences, 
    # compute the exponential function and round the result
    round(exp(diff(log(x))), 1)

    With the help of %<%, you can rewrite the above code as follows:

    # Import `magrittr`
    library(magrittr)
    
    # Perform the same computations on `x` as above
    x %>% log() %>%
        diff() %>%
        exp() %>%
        round(1)

    Does this seem difficult to you? No worries! You'll learn more on how to go about this later on in this tutorial.

    Note that you need to import the magrittr library to get the above code to work. That's because the pipe operator is, as you read above, part of the magrittr library and is, since 2014, also a part of dplyr. If you forget to import the library, you'll get an error like Error in eval(expr, envir, enclos): could not find function "%>%".

    Also note that it isn't a formal requirement to add the parentheses after log, diff and exp, but that, within the R community, some will use it to increase the readability of the code.

    In short, here are four reasons why you should be using pipes in R:

    • You'll structure the sequence of your data operations from left to right, as apposed to from inside and out;
    • You'll avoid nested function calls;
    • You'll minimize the need for local variables and function definitions; And
    • You'll make it easy to add steps anywhere in the sequence of operations.

    These reasons are taken from the magrittr documentation itself. Implicitly, you see the arguments of readability and flexibility returning.

    Additional Pipes

    Even though %>% is the (main) pipe operator of the magrittr package, there are a couple of other operators that you should know and that are part of the same package:

    • The compound assignment operator %<>%;
    # Initialize `x` 
    x <- rnorm(100)
    
    # Update value of `x` and assign it to `x`
    x %<>% abs %>% sort
    • The tee operator %T>%;
    rnorm(200) %>%
    matrix(ncol = 2) %T>%
    plot %>% 
    colSums

    Note that it's good to know for now that the above code chunk is actually a shortcut for:

    rnorm(200) %>% matrix(ncol = 2) %T>% { plot(.); . } %>% colSums

    But you'll see more about that later on!

    • The exposition pipe operator %$%.
    data.frame(z = rnorm(100)) %$% 
      ts.plot(z)

    Of course, these three operators work slightly differently than the main %>% operator. You'll see more about their functionalities and their usage later on in this tutorial!

    Note that, even though you'll most often see the magrittr pipes, you might also encounter other pipes as you go along! Some examples are wrapr's dot arrow pipe %.>% or to dot pipe %>.%, or the Bizarro pipe ->.;.

    How to Use Pipes in R

    Now that you know how the %>% operator originated, what it actually is and why you should use it, it's time for you to discover how you can actually use it to your advantage. You will see that there are quite some ways in which you can use it!

    Basic Piping

    Before you go into the more advanced usages of the operator, it's good to first take a look at the most basic examples that use the operator. In essence, you'll see that there are 3 rules that you can follow when you're first starting out:

    • f(x) can be rewritten as x %>% f

    In short, this means that functions that take one argument, function(argument), can be rewritten as follows: argument %>% function(). Take a look at the following, more practical example to understand how these two are equivalent:

    # Compute the logarithm of `x` 
    log(x)
    
    # Compute the logarithm of `x` 
    x %>% log()