Pipes in R Tutorial For Beginners
Learn more about the famous pipe operator %>% and other pipes in R, why and how you should use them and what alternatives you can consider!
You might have already seen or used the pipe operator when you're working with packages such as dplyr
, magrittr
,... But do you know where pipes and the famous %>%
operator come from, what they exactly are, or how, when and why you should use them? Can you also come up with some alternatives?
This tutorial will give you an introduction to pipes in R and will cover the following topics:
- Pipe Operator in R: Introduction
- Short History of the Pipe Operator in R
- What Is It?
- Why Use It?
- Additional Pipes
- How To Use Pipes in R
- Basic Piping
- Argument Placeholder
- Re-using Placeholder for Attributes
- Building Unary Functions
- Compound Assignment Pipe Operations
- Tee Operations with the Tee Operator
- Exposing Data Variables with the Exposition Operator
dplyr
andmagrittr
- RStudio Keyboard Shortcuts
- When Not To Use the Pipe Operator in R
- Alternatives to Pipes in R
Are you interested in learning more about manipulating data in R with dplyr
? Take a look at DataCamp's Data Manipulation in R with dplyr
course.
Pipe Operator in R: Introduction
To understand what the pipe operator in R is and what you can do with it, it's necessary to consider the full picture, to learn the history behind it. Questions such as "where does this weird combination of symbols come from and why was it made like this?" might be on top of your mind. You'll discover the answers to these and more questions in this section.
Now, you can look at the history from three perspectives: from a mathematical point of view, from a holistic point of view of programming languages, and from the point of view of the R language itself. You'll cover all three in what follows!
History of the Pipe Operator in R
Mathematical History
If you have two functions, let's say
For example, you can say,
If you would want to note this down, you will use the notation
Pipe Operators in Other Programming Languages
As mentioned in the introduction to this section, this operator is not new in programming: in the Shell or Terminal, you can pass command from one to the next with the pipeline character |
. Similarly, F# has a forward pipe operator, which will prove to be important later on! Lastly, it's also good to know that Haskell contains many piping operations that are derived from the Shell or Terminal.
Pipes in R
Now that you have seen some history of the pipe operator in other programming languages, it's time to focus on R. The history of this operator in R starts, according to this fantastic blog post written by Adolfo Álvarez, on January 17th, 2012, when an anonymous user asked the following question in this Stack Overflow post:
How can you implement F#'s forward pipe operator in R? The operator makes it possible to easily chain a sequence of calculations. For example, when you have an input data and want to call functions
foo
andbar
in sequence, you can writedata |> foo |> bar
?
The answer came from Ben Bolker, professor at McMaster University, who replied:
I don't know how well it would hold up to any real use, but this seems (?) to do what you want, at least for single-argument functions ...
"%>%" <- function(x,f) do.call(f,list(x)) pi %>% sin [1] 1.224606e-16 pi %>% sin %>% cos [1] 1 cos(sin(pi)) [1] 1
About nine months later, Hadley Wickham started the dplyr
package on GitHub. You might now know Hadley, Chief Scientist at RStudio, as the author of many popular R packages (such as this last package!) and as the instructor for DataCamp's Writing Functions in R course.
Be however it may, it wasn't until 2013 that the first pipe %.%
appears in this package. As Adolfo Álvarez rightfully mentions in his blog post, the function was denominated chain()
, which had the purpose to simplify the notation for the application of several functions to a single data frame in R.
The %.%
pipe would not be around for long, as Stefan Bache proposed an alternative on the 29th of December 2013, that included the operator as you might now know it:
iris %>% subset(Sepal.Length > 5) %>% aggregate(. ~ Species, ., mean)
Bache continued to work with this pipe operation and at the end of 2013, the magrittr
package came to being. In the meantime, Hadley Wickham continued to work on dplyr
and in April 2014, the %.%
operator got replaced with the one that you now know, %>%
.
Later that year, Kun Ren published the pipeR
package on GitHub, which incorporated a different pipe operator, %>>%
, which was designed to add more flexibility to the piping process. However, it's safe to say that the %>%
is now established in the R language, especially with the recent popularity of the Tidyverse.
What Is It?
Knowing the history is one thing, but that still doesn't give you an idea of what F#'s forward pipe operator is nor what it actually does in R.
In F#, the pipe-forward operator |>
is syntactic sugar for chained method calls. Or, stated more simply, it lets you pass an intermediate result onto the next function.
Remember that "chaining" means that you invoke multiple method calls. As each method returns an object, you can actually allow the calls to be chained together in a single statement, without needing variables to store the intermediate results.
In R, the pipe operator is, as you have already seen, %>%
. If you're not familiar with F#, you can think of this operator as being similar to the +
in a ggplot2
statement. Its function is very similar to that one that you have seen of the F# operator: it takes the output of one statement and makes it the input of the next statement. When describing it, you can think of it as a "THEN".
Take, for example, following code chunk and read it aloud:
# Install the necessary packages
install.packages("babynames")
install.packages("hflights")
library(magrittr)
iris %>%
subset(Sepal.Length > 5) %>%
aggregate(. ~ Species, ., mean)
You're right, the code chunk above will translate to something like "you take the Iris data, then you subset the data and then you aggregate the data".
This is one of the most powerful things about the Tidyverse. In fact, having a standardized chain of processing actions is called "a pipeline". Making pipelines for a data format is great, because you can apply that pipeline to incoming data that has the same formatting and have it output in a ggplot2
friendly format, for example.
Why Use It?
R is a functional language, which means that your code often contains a lot of parenthesis, (
and )
. When you have complex code, this often will mean that you will have to nest those parentheses together. This makes your R code hard to read and understand. Here's where %>%
comes in to the rescue!
Take a look at the following example, which is a typical example of nested code:
# Initialize `x`
x <- c(0.109, 0.359, 0.63, 0.996, 0.515, 0.142, 0.017, 0.829, 0.907)
# Compute the logarithm of `x`, return suitably lagged and iterated differences,
# compute the exponential function and round the result
round(exp(diff(log(x))), 1)
With the help of %<%
, you can rewrite the above code as follows:
# Import `magrittr`
library(magrittr)
# Perform the same computations on `x` as above
x %>% log() %>%
diff() %>%
exp() %>%
round(1)
Does this seem difficult to you? No worries! You'll learn more on how to go about this later on in this tutorial.
Note that you need to import the magrittr
library to get the above code to work. That's because the pipe operator is, as you read above, part of the magrittr
library and is, since 2014, also a part of dplyr
. If you forget to import the library, you'll get an error like Error in eval(expr, envir, enclos): could not find function "%>%"
.
Also note that it isn't a formal requirement to add the parentheses after log
, diff
and exp
, but that, within the R community, some will use it to increase the readability of the code.
In short, here are four reasons why you should be using pipes in R:
- You'll structure the sequence of your data operations from left to right, as apposed to from inside and out;
- You'll avoid nested function calls;
- You'll minimize the need for local variables and function definitions; And
- You'll make it easy to add steps anywhere in the sequence of operations.
These reasons are taken from the magrittr
documentation itself. Implicitly, you see the arguments of readability and flexibility returning.
Additional Pipes
Even though %>%
is the (main) pipe operator of the magrittr
package, there are a couple of other operators that you should know and that are part of the same package:
- The compound assignment operator
%<>%
;
# Initialize `x`
x <- rnorm(100)
# Update value of `x` and assign it to `x`
x %<>% abs %>% sort
- The tee operator
%T>%
;
rnorm(200) %>%
matrix(ncol = 2) %T>%
plot %>%
colSums
Note that it's good to know for now that the above code chunk is actually a shortcut for:
rnorm(200) %>% matrix(ncol = 2) %T>% { plot(.); . } %>% colSums
But you'll see more about that later on!
- The exposition pipe operator
%$%
.
data.frame(z = rnorm(100)) %$%
ts.plot(z)
Of course, these three operators work slightly differently than the main %>%
operator. You'll see more about their functionalities and their usage later on in this tutorial!
Note that, even though you'll most often see the magrittr
pipes, you might also encounter other pipes as you go along! Some examples are wrapr
's dot arrow pipe %.>%
or to dot pipe %>.%
, or the Bizarro pipe ->.;
.
How to Use Pipes in R
Now that you know how the %>%
operator originated, what it actually is and why you should use it, it's time for you to discover how you can actually use it to your advantage. You will see that there are quite some ways in which you can use it!
Basic Piping
Before you go into the more advanced usages of the operator, it's good to first take a look at the most basic examples that use the operator. In essence, you'll see that there are 3 rules that you can follow when you're first starting out:
f(x)
can be rewritten asx %>% f
In short, this means that functions that take one argument, function(argument)
, can be rewritten as follows: argument %>% function()
. Take a look at the following, more practical example to understand how these two are equivalent:
# Compute the logarithm of `x`
log(x)
# Compute the logarithm of `x`
x %>% log()