What Is a Formula in R?

Many functions in R make use of formulas: packages such as ggplot2, stats, lattice, and dplyr all use them We usually use formulas inside these function calls to generate “special behavior”

They allow you to capture two things:

  1. An unevaluated expression
  2. The context or environment in which the expression was created

In R the tilde operator ~ characterizes formulas With this operator, you say: “capture the meaning of this code, without evaluating it” You can think of a formula in R as a “quoting” operator

# A formula
d <- y ~ x + b

The variable on the left-hand side of a tilde (~) is called the “dependent variable”, while the variables on the right-hand side are called the “independent variables” and are joined by plus signs +.

You can access the elements of a formula with the help of the square brackets: [[and ]].

f <- y ~ x + b 

# Retrieve the elements at index 1 and 2
f[[1]]
## `~`
f[[2]]
## y
f[[3]]
## x + b

Why Use Formulae in R?

Formulas are powerful, general-purpose tools that allow you to capture the values of variables without evaluating them so that they can be interpreted by the function

Also, you use these R objects to express a relationship between variables.

For example, in the first line of code in the code chunk below, you say “y is a function of x, a, and b”

y ~ x + a + b
## y ~ x + a + b

More complex formulas like the code chunk below:

Sepal.Width ~ Petal.Width | Species
## Sepal.Width ~ Petal.Width | Species

Where you mean to say “the sepal width is a function of petal width, conditioned on species”

Using Formulas in R

How To Create a Formula in R

1.With the help of ~ operator 2.Some times you need or want to create a formula from an R object, such as a string. In such cases, you can use the formula or as.formula() function

"y ~ x1 + x2"
## [1] "y ~ x1 + x2"
h <- as.formula("y ~ x1 + x2")

h <- formula("y ~ x1 + x2")

How To Concatenate Formulae

To glue or bring multiple formulas together, you have two option:

  1. Create separate variables for each formula and then use list()
# Create variables
i <- y ~ x
j <- y ~ x + x1
k <- y ~ x + x1 + x2

# Concatentate
formulae <- list(as.formula(i),as.formula(j),as.formula(k))
  1. Use the lapply() function, where you pass in a vector with all of your formulas as a first argument and as.formula as the function that you want to apply to each element of that vector
# Join all with `c()`
l <- c(i, j, k)

# Apply `as.formula` to all elements of `f`
lapply(l, as.formula)
## [[1]]
## y ~ x
## 
## [[2]]
## y ~ x + x1
## 
## [[3]]
## y ~ x + x1 + x2

Formula Operators

“+” for joining

“-” for removing terms

“:” for interaction

“*" for crossing

“%in%” for nesting

“^” for limit crossing to the specified degree

# Use multiple independent variables
y ~ x1 + x2
## y ~ x1 + x2
# Ignore objects in an analysis
y ~ x1 - x2
## y ~ x1 - x2

What if you want to actually perform an arithmetic operation? you have a couple of solutions:

1.You can calculate and store all of the variables in advance 2.You use the I() or “as-is” operator: y ~ x + I(x^2)

How To Inspect Formulas in R

You saw functions such as attributes(), typeof(), class(), etc

To examine and compare different formulae, you can use the terms() function:

m <- formula("y ~ x1 + x2")
terms(m)
## y ~ x1 + x2
## attr(,"variables")
## list(y, x1, x2)
## attr(,"factors")
##    x1 x2
## y   0  0
## x1  1  0
## x2  0  1
## attr(,"term.labels")
## [1] "x1" "x2"
## attr(,"order")
## [1] 1 1
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 1
## attr(,".Environment")
## <environment: R_GlobalEnv>
class(m)
## [1] "formula"
typeof(m)
## [1] "language"
attributes(m)
## $class
## [1] "formula"
## 
## $.Environment
## <environment: R_GlobalEnv>

If you want to know the names of the variables in the model, you can use all.vars.

print(all.vars(m))
## [1] "y"  "x1" "x2"

To modify formulae without converting them to character you can use the update() function:

update(y ~ x1 + x2, ~. + x3)
## y ~ x1 + x2 + x3
y ~ x1 + x2 + x3
## y ~ x1 + x2 + x3

Double check whether you variable is a formula by passing it to the is.formula() function.

# Load `plyr`
library(plyr)

# Check `m`
is.formula(m)
## [1] TRUE

When To Use Formulas

1.Modeling Functions

2.Graphical Functions in R

R Formula Packages

1.Formula Package

2.formula.tools