Mathematical History

You have two functions, let’s say f:B->C and g:A->B

Chain these functions together by taking the output of one function and inserting it into the next f(g(x)): g(x)serves as an input for f(), while x, of course, serves as input to g()./f.g, which reads as “f follows g”

Pipe Operators in Other Programming Languages

Shell or Terminal character |. also in F# & Haskell ## Pipes in R On January 17th, 2012, an anonymous user asked a question in Stack Overflow post: How can you implement F#’s forward pipe operator in R? data |> foo |> bar?
The answer:

“%>%” <- function(x,f) do.call(f,list(x)) pi %>% sin [1] 1.224606e-16 pi %>% sin %>% cos [1] 1 cos(sin(pi)) [1] 1 dplyr package magrittr package pipeR package

library(magrittr)
iris %>%
  subset(Sepal.Length > 5) %>%
  aggregate(. ~ Species, ., mean)
##      Species Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1     setosa     5.313636    3.713636     1.509091   0.2772727
## 2 versicolor     5.997872    2.804255     4.317021   1.3468085
## 3  virginica     6.622449    2.983673     5.573469   2.0326531

It takes the output of one statement and makes it the input of the next statement. think of it as a “THEN”. The above code is translated to: “you take the Iris data, then you subset the data and then you aggregate the data” Chain of processing actions is called “a pipeline”.

Why Use It?

R is a functional language, means your code often contains a lot of parenthesis, complex code, mean that you have to nest those parentheses . This makes your R code hard to read and understand. Here’s where %>% comes in to the rescue!

x <- c(0.109, 0.359, 0.63, 0.996, 0.515, 0.142, 0.017, 0.829, 0.907)

round(exp(diff(log(x))), 1)
## [1]  3.3  1.8  1.6  0.5  0.3  0.1 48.8  1.1

With the help of %<%, you can rewrite the above code as follows:

library(magrittr)

x %>% log() %>%
    diff() %>%
    exp() %>%
    round(1)
## [1]  3.3  1.8  1.6  0.5  0.3  0.1 48.8  1.1

Four reasons why we use pipes in R:

1.You’ll structure the sequence of your data operations from left to right, as apposed to from inside and out;

2.You’ll avoid nested function calls;

3.You’ll minimize the need for local variables and function definitions;

4.You’ll make it easy to add steps anywhere in the sequence of operations.

How to Use Pipes in R

f(x) can be rewritten as x %>% f function(argument), can be rewritten as follows: argument %>% function()

 x <- c(0.109, 0.359, 0.63, 0.996, 0.515, 0.142, 0.017, 0.829, 0.907)

log(x)
## [1] -2.216407397 -1.024432890 -0.462035460 -0.004008021 -0.663588378
## [6] -1.951928221 -4.074541935 -0.187535124 -0.097612829
x %>% log()
## [1] -2.216407397 -1.024432890 -0.462035460 -0.004008021 -0.663588378
## [6] -1.951928221 -4.074541935 -0.187535124 -0.097612829

x %>% f %>% g %>% h can be rewritten as h(g(f(x)))

round(pi, 6)
## [1] 3.141593
pi %>% round(6)
## [1] 3.141593

Import babynames data Load the data Count how many young boys with the name “Taylor” are born

library(babynames)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data(babynames)

sum(select(filter(babynames,sex=="M",name=="Taylor"),n))
## [1] 108630

With the help of pipe

babynames%>%filter(sex=="M",name=="Taylor")%>%
            select(n)%>%
            sum
## [1] 108630

Pipe as Argument Placeholder

“Ceci n’est pas une pipe” replace “une” with “un”

"Ceci n'est pas une pipe" %>% gsub("une", "un", .)
## [1] "Ceci n'est pas un pipe"

Different pipe versions

Compound Assignment Pipe Operations

iris <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), header = FALSE)

# Add column names to the Iris data
names(iris) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species")

# Compute the square root of `iris$Sepal.Length` and assign it to the variable
iris$Sepal.Length <- 
  iris$Sepal.Length %>%
  sqrt()

Tee Operations with The Tee Operator %T>%

The tee operator works exactly like %>%, but it returns the left-hand side value rather than the potential result of the right-hand side operations. this is useful when a step in a pipeline is used for its side-effect (printing, plotting, logging, etc.)

rnorm(200) %>%
matrix(ncol = 2) %T>%
plot %>% 
colSums

## [1]  -2.03522 -13.64960

The “exposition” pipe operator %$%

It is handy if you can expose the variables in the data. That’s where the %$% operator comes in. Consider the following example

iris %>%
  subset(Sepal.Length > mean(Sepal.Length)) %$%
  cor(Sepal.Length, Sepal.Width)
## [1] 0.3365679

With the help of %$% you make sure that Sepal.Length and Sepal.Width are exposed to cor(). # When Not To Use the Pipe Operator in R

1.Your pipes are longer than (say) ten steps.

Create intermediate objects It is easier for you to debug your code and easier for others to understand your code

2.You have multiple inputs or outputs.

3.You are starting to think about a directed graph with a complex dependency structure.

Pipes are fundamentally linear and expressing complex relationships with them will only result in complex code that will be hard to read and understand.

4.You’re doing internal package development

Using pipes in internal package development is a no-go, as it makes it harder to debug!

Alternatives to Pipes in R

1.Create intermediate variables with meaningful names

2.Nest your code so that you read it from the inside out