A huge “thank you” to the reviewers and editors for helping us with this! You can find our article here (pdf here)!

We have some examples that didn’t make it to the formal paper here.

]]>`R`

is just how concise and powerful macros are. The problem is macros are concise, but they do a lot for you. So you get bogged down when you explain the joke.
Let’s try to be concise.

Below is an extension of an example taken from the Programming with `dplyr`

note.

First let’s load the package and define our symbols that hold names of columns we wish to work with later.

```
suppressPackageStartupMessages(library("dplyr"))
group_nm <- as.name("am")
num_nm <- as.name("hp")
den_nm <- as.name("cyl")
derived_nm <- as.name(paste0(num_nm, "_per_", den_nm))
mean_nm <- as.name(paste0("mean_", derived_nm))
count_nm <- as.name("count")
```

Now let’s use `rlang`

to substitute those symbols into a non-trivial `dplyr`

pipeline.

```
mtcars %>%
group_by(!!group_nm) %>%
mutate(!!derived_nm := !!num_nm/!!den_nm) %>%
summarize(
!!mean_nm := mean(!!derived_nm),
!!count_nm := n()
) %>%
ungroup() %>%
arrange(!!group_nm)
```

```
## # A tibble: 2 x 3
## am mean_hp_per_cyl count
## <dbl> <dbl> <int>
## 1 0 22.7 19
## 2 1 23.4 13
```

The above is very useful, we have just gotten programmatic control of all the symbol names in a pipeline. This is what is needed to wrap such a pipeline in a function and make it parametric re-usable.

The thing is Thomas Lumley’s `base::bquote()`

could achieve this in 2003, and Gregory R. Warnes’ `gtools::strmacro()`

could further automate specifying the automation in 2005.

Lets show that. First we use `gtools`

to build a "`bquote()`

wrapping factory."

```
library("gtools")
# build a method wrapping macro
bq_wrap <- strmacro(
FN,
expr = {
FN <- function(.data, ...) {
env = parent.frame()
mc <- substitute(dplyr::FN(.data = .data, ...))
mc <- do.call(bquote, list(mc, where = env), envir = env)
eval(mc, envir = env)
}
}
)
```

Now we use it to wrap some `dplyr`

methods (ignoring non `...`

options).

```
# wrap some dplyr methods
bq_wrap(mutate)
bq_wrap(summarize)
bq_wrap(group_by)
bq_wrap(arrange)
```

At this point we have re-adapted 4 `dplyr`

methods to use `bquote()`

quasiquotation. This is what we mean by `strmacro()`

is a tool to build tools.

And here is the same pipeline again, entirely driven by `bquote()`

.

```
mtcars %>%
group_by(.(group_nm)) %>%
mutate(.(derived_nm) := .(num_nm)/.(den_nm)) %>%
summarize(
.(mean_nm) := mean(.(derived_nm)),
.(count_nm) := n()
) %>%
ungroup() %>%
arrange(.(group_nm))
```

```
## # A tibble: 2 x 3
## am mean_hp_per_cyl count
## <dbl> <dbl> <int>
## 1 0 22.7 19
## 2 1 23.4 13
```

]]>`R`

ecosystem.

I guess the closest I can come to a fair and coherent view on “competition” in the `R`

ecosystem is some variation of the following.

- I, of course, should not be treating things as a competition. We are all doing work and hoping for a bit of public mind share.
- We all want our own work to do well. So we are a little sad if other work supplants our work, and a little happy if our work is adopted. However, we must respect if our work is adopted, we are supplanting other work- the very thing we do not enjoy when it happens to us.

So, I’d definitely like to apologize for times I have not thought clearly and treated some aspects of the ecosystem as competition. It is when we are thinking hardest about ourselves we are mostly likely to offend others.

That being said, there is some context that I feel matters.

- Please understand any new technique is always going to be asked to compare itself to both base-
`R`

and the`tidyverse`

. These are natural questions. One has to walk a fine line between not mentioning these (and perhaps unfairly slighting them), or adding the comparison (and seeming pushy). - Size, distribution, and transparency matter. A new package that is promoted by a large company and/or immediately included in popular packages or meta-packages controlled the same authors can eliminate even the possibility of fair comparison to other work. Frankly I think there is some responsibility to take additional care and concern in these cases. Winner take all popularity tracking systems have similar risks (encouraging new users come to conclusions prior to looking at any alternatives).
- Precedence in no way entitles one to priority. Sometimes our work is a later alternative to earlier work by others (and we do try to give credit in these situations), and sometimes others’ work is a later alternative to ours. And frankly sometimes base-
`R`

already does a good job, and we just missed it (though we are not alone in that, also one must take care to respect that base-`R`

itself is a collection of other people’s contributions).

For example: our own `wrapr`

dot-arrow pipe comes long after the `magritter`

pipe. We try to keep the history clear, but frankly it takes some effort for work related to such a popular notation to be heard. I understand some are offended by our promotional effort, but we feel we have some valuable improvements to share (which can only be shown by comparison), and writing notes is the only platform we have.

As a contrary example: our own `let()`

method comes before the `rlang`

package, but has been formally criticized as being too similar to `rlang`

. We’ve tried to write down some of the context, but really that should not be our task alone.

Of course there is a risk of a (hopefully breakable) negative cycle: what we do when frustrated, in turn frustrates others.

]]>`R`

package `wrapr`

supplies a "piping operator" that we feel is a real improvement in R code piped-style coding.
The idea is: with `wrapr`

‘s "dot arrow" pipe "`%.>%`

" the expression "`A %.>% B`

" is treated very much like "`{. <- A; B}`

". In particular this lets users think of "`A %.>% B(.)`

" as a left-to-right way to write "`B(A)`

" (i.e. under the convention of writing-out the dot arguments, the pipe looks a bit like left to right function composition, call this explicit dot notation).

This sort of notation becomes useful when we compose many steps. Some consider "`A %.>% B(.) %.>% C(.) %.>% D(.)`

" to be easier to read and easier to maintain than "`D(C(B(A)))`

".

In terms of the popular `magrittr`

pipe: `wrapr`

dot arrow "`A %.>% B`

" behaves a lot like "`A %>% {B}`

" ("`{}`

" on the right being `magrittr`

‘s special notation to treat the right-hand side as an expression instead of as a function call). Like `magrittr`

, `wrapr`

dot arrow can be used to write "`sin(5)`

" as "`5 %.>% sin`

", though the preferred `wrapr`

notation is the explicit dot notation: "`5 %.>% sin(.)`

". Unlike `magrittr`

, `wrapr`

deliberately does not accept "only parenthesis notation" "`5 %>% sin()`

" (our view is: if the user goes to all the trouble to tell us there are no arguments, `wrapr`

dot arrow should take their word for it).

`wrapr`

dot arrow strives to be regular (have few different operating modes), giving the user a lot options and a lot of expressive power. Please see here for a small study we conducted on this.

Lets show a few examples that all share a secret feature that we will reveal at the end of this article (note, any example prior to here is not part of the secret).

First we can pipe into package qualified functions (with or without the explicit dot notation).

```
library("wrapr")
5 %.>% base::sin
```

`## [1] -0.9589243`

In the function notation (without the dot) argument names are preserved.

```
library("wrapr")
d <- 5
d %.>% substitute
```

`## d`

`d %.>% base::substitute`

`## d`

Also, piping into functions held as `list`

items is supported.

```
library("wrapr")
obj <- list(f = sin)
5 %.>% obj$f
```

`## [1] -0.9589243`

`5 %.>% obj[['f']]`

`## [1] -0.9589243`

And piping into parenthesized expressions is supported, so it is safe to add clarifying parenthesis when one wishes.

```
library("wrapr")
5 %.>% (1 + .)
```

`## [1] 6`

In the dot notation piping into nested functions is handled smoothly.

```
# From http://piccolboni.info/2015/09/pipe-operator-for-R.html
library("wrapr")
4 %.>% sqrt(sqrt(.))
```

`## [1] 1.414214`

`wrapr`

dot arrow is compatible with `dplyr`

"pronoun" notation (the "`.data`

" and "`.env`

" qualifiers below).

```
# Adapted from https://github.com/tidyverse/dplyr/issues/3286
suppressPackageStartupMessages(library("dplyr"))
library("wrapr")
cyl <- 4
mtcars %.>%
filter(., .data$cyl == .env$cyl) %.>%
head(.)
```

```
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 4 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## 5 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## 6 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
```

`wrapr`

dot arrow can even be used *inside* `dplyr`

expressions.

```
suppressPackageStartupMessages(library("dplyr"))
library("wrapr")
mtcars %.>%
mutate(.,
someMean = . %.>%
select(., cyl, disp, carb) %.>%
rowMeans(.)) %.>%
head(.)
```

```
## mpg cyl disp hp drat wt qsec vs am gear carb someMean
## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 56.66667
## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 56.66667
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 37.66667
## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 88.33333
## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 123.33333
## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 77.33333
```

Some nested items can be subtle, for in the example below the dot in "`rev(.)[2]`

" is actually not a top-level argument as the "`rev(.)`

" expression is technically an argument to the `[]`

operator. Unlike `magrittr`

, `wrapr`

does not change its substitution behavior based on the presence or absence of an argument named "`.`

" at the top-level of expressions, making it easy to reason about expression semantics.

```
library("wrapr")
c(10, 20, 30) %.>% rev(.)[1]
```

`## [1] 30`

`wrapr`

dot arrow is compatible with `data.table`

, even with the common addition of visibility controlling `[]`

-calls.

```
library("data.table")
library("wrapr")
d <- data.table(x = c(2, 1, 3))
d %.>%
.[, y := x + 1] %.>%
setorder(., x)[]
```

```
## x y
## 1: 1 2
## 2: 2 3
## 3: 3 4
```

(Note: if the `wrapr`

package is attached before `data.table`

there will be a warning regarding "`:=`

". This is actually not a problem as `data.table`

uses the "`:=`

" symbol in its own parsing and in its own package context where it can not be confused with `wrapr`

‘s definition.)

Another subtle form of expression nesting are operators such as `%in%`

. In expressions such as "`filter(d, a %in% .)`

", the dot is not an argument to `filter()`

but an argument to `%in%`

.

```
# adapted from: https://stackoverflow.com/questions/46728387/piping-with-dot-inside-dplyrfilter
suppressPackageStartupMessages(library("dplyr"))
library("wrapr")
d <- data.frame(a = c(1,2,3),
b = c(4,5,6))
c(2,2) %.>% filter(d, a %in% .)
```

```
## a b
## 1 2 5
```

`c(2,2) %.>% dplyr::filter(d, a %in% .)`

```
## a b
## 1 2 5
```

These are all of our examples. We are now ready to share their common secret.

Noneof the above examples work when translated into`magrittr`

pipelines (unless one adds braces and explicit dot-arguments).

When we were not thinking about it deeply all of the above code likely seemed reasonable. In fact it *was* reasonable, it just doesn’t happen to follow `magrittr`

conventions.

We call out just a couple to show the issues.

```
library("magrittr")
5 %>% base::sin
```

`## Error in .::base: unused argument (sin)`

```
library("magrittr")
d <- 5
d %>% substitute
```

`## value`

The above illustrates a problem we run into in promoting `wrapr`

dot arrow: the misconception that the `magrittr`

pipe leaves no room for improvement.

Our impression is `magrittr`

users eventually learn and internalize what variations work and do not work for `magrittr`

, and then learn to limit their coding to stay in the safe `magrittr`

fragment. In our opinion once one learns the one limitation of `wrapr`

dot arrow (the requirement of explicit dot arguments, due to the expression orientation of the pipe) one can use dot arrow as a much more powerful pipe that supports many more coding patterns (such piping into function references, as we showed above).

For those that are interested we have tutorials and formal documentation.

]]>`wrapr`

`1.6.2`

is now up on CRAN. We have some neat new features for `R`

users to try (in addition to many earlier `wrapr`

goodies).

The first is the `%in_block%`

alternate notation for `let()`

.

The `wrapr`

`let()`

-block allows easy replacement of names in name-capturing interfaces (such as `transform()`

), as we show below.

```
library("wrapr")
column_mapping <- qc(
AREA_COL = Sepal.Area,
LENGTH_COL = Sepal.Length,
WIDTH_COL = Sepal.Width
)
# let-block notation
let(
alias = column_mapping,
iris %.>%
transform(.,
AREA_COL = (pi/4)*LENGTH_COL*WIDTH_COL) %.>%
subset(.,
select = qc(Species, AREA_COL)) %.>%
head(.)
)
```

```
## Species Sepal.Area
## 1 setosa 14.01936
## 2 setosa 11.54535
## 3 setosa 11.81239
## 4 setosa 11.19978
## 5 setosa 14.13717
## 6 setosa 16.54049
```

The `qc()`

notation allowed us to specify a named-`vector`

without quotes. `qc(a = b)`

is equivalent to `c("a" = "b")`

.

With the `%in_block%`

operator notation one writes the `let()`

-block as an in-line operator supplying the mapping into a code block. The above example can now be re-written as the following.

```
# %in_block% notation
column_mapping %in_block% {
iris %.>%
transform(.,
AREA_COL = (pi/4)*LENGTH_COL*WIDTH_COL) %.>%
subset(.,
select = qc(Species, AREA_COL)) %.>%
head(.)
}
```

```
## Species Sepal.Area
## 1 setosa 14.01936
## 2 setosa 11.54535
## 3 setosa 11.81239
## 4 setosa 11.19978
## 5 setosa 14.13717
## 6 setosa 16.54049
```

This notation can be handy for defining functions.

```
compute_area <- function(
.data,
area_col,
length_col,
width_col) c( # End of function argument definiton
AREA_COL = area_col,
LENGTH_COL = length_col,
WIDTH_COL = width_col
) %in_block% { # End of argument mapping block
.data %.>%
transform(.,
AREA_COL = (pi/4)*LENGTH_COL*WIDTH_COL)
} # End of function body block
iris %.>%
compute_area(.,
'Sepal.Area', 'Sepal.Length', 'Sepal.Width') %.>%
compute_area(.,
'Petal.Area', 'Petal.Length', 'Petal.Width') %.>%
subset(.,
select = c("Species", "Sepal.Area", "Petal.Area")) %.>%
head(.)
```

```
## Species Sepal.Area Petal.Area
## 1 setosa 14.01936 0.2199115
## 2 setosa 11.54535 0.2199115
## 3 setosa 11.81239 0.2042035
## 4 setosa 11.19978 0.2356194
## 5 setosa 14.13717 0.2199115
## 6 setosa 16.54049 0.5340708
```

We can think of the above function definition notation as having two blocks: the alias defining block (the portion before "`%in_block%`

") and the templated function body (the portion after "`%in_block%`

"). Notice how easy it is to use this notation to convert a non-standard (or name/code-capturing interface) into a value-oriented interface. The point is value-oriented interfaces are much more re-usable and easier to program over (use in for-loops, applies, and functions).

The second new feature is the `orderv()`

function, a value-oriented adapter for `base::order()`

. `orderv()`

uses a vector of column names to compute an ordering permutation for a `data.frame`

. We can use it as we show below.

```
library("wrapr")
sort_columns <- qc(mpg, hp, gear)
ordering <- orderv(mtcars[ , sort_columns, drop = FALSE],
decreasing = TRUE,
method = "radix")
head(mtcars[ordering, , drop = FALSE])
```

```
## mpg cyl disp hp drat wt qsec vs am gear carb
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
```

Of course we have also have all the steps wrapped in a convenient function: `sortv()`

.

```
mtcars %.>%
sortv(.,
sort_columns,
decreasing = TRUE,
method = "radix") %.>%
head(.)
```

```
## mpg cyl disp hp drat wt qsec vs am gear carb
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
```

For details on "`method = "radix"`

" please see our earlier tip here.

A third new feature is `mk_formula()`

. `mk_formula()`

is used to build simple formulas for modeling tasks (which may have a large number of variables) without any string processing or parsing.

Our usual advice for building simple formulas has been to use the `paste()`

-based methods exhibited in "R Tip: How to Pass a formula to lm". This remains good advice. However `mk_formula()`

is a more concise and more hygienic alternative. An example is given below.

```
# specifications of how to model,
# coming from somewhere else
outcome <- "mpg"
variables <- c("cyl", "disp", "hp", "carb")
# our modeling effort,
# fully parameterized!
f <- wrapr::mk_formula(outcome, variables)
print(f)
```

`## mpg ~ cyl + disp + hp + carb`

```
model <- lm(f, data = mtcars)
print(model)
```

```
##
## Call:
## lm(formula = f, data = mtcars)
##
## Coefficients:
## (Intercept) cyl disp hp carb
## 34.021595 -1.048523 -0.026906 0.009349 -0.926863
```

The above notation is good for programming over modeling tasks.

Edit: `mk_formula()`

duplicates some functionality of `stats::reformulate()`

. Though the current implementation of `stats::reformulate()`

appears to use the `paste()`

pattern (which I actually like). However we get “cluck-clucked” when we use `paste()`

to build up formulas, so our code is in terms of `stats::update.formula()`

(which appears to use terms and not pasting, though that is not confirmed).

Our publisher, Manning, has a great slide deck describing the book (and a discount code!!!) here:

We also just got back our part-1 technical review for the new book. Here is a quote from the technical review we are particularly proud of:

The dot notation for base

`R`

and the`dplyr`

package did make me stand up and think. Certain things suddenly made sense.

The reviewer is reacting to an improved section on how to organize calculations that condenses and combines the best ideas from the following articles:

- R Tip: Break up Function Nesting for Legibility
- Using the Bizarro Pipe to Debug magrittr Pipelines in R
- R Tip: Make Arguments Explicit in magrittr/dplyr Pipelines

*Practical Data Science with R* and *Practical Data Science with R ^{2}* are what

`wrapr`

package (which was not available when we wrote the first edition). And we will show how to move `data.table`

package!We really think this is a book you are going to want to learn from, or even teach from. The great thing is you can start working with *Practical Data Science with R ^{2}* right now through Manning’s Early Access Program (MEAP)! Heck, we even throw in a complete e-copy of the first edition at no extra cost!

`R`

users who also use the `dplyr`

package will be able to quickly understand the following code that adds an estimated area column to a `data.frame`

.
```
suppressPackageStartupMessages(library("dplyr"))
iris %>%
mutate(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) %>%
head(.)
```

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1 5.1 3.5 1.4 0.2 setosa 0.2199115
## 2 4.9 3.0 1.4 0.2 setosa 0.2199115
## 3 4.7 3.2 1.3 0.2 setosa 0.2042035
## 4 4.6 3.1 1.5 0.2 setosa 0.2356194
## 5 5.0 3.6 1.4 0.2 setosa 0.2199115
## 6 5.4 3.9 1.7 0.4 setosa 0.5340708
```

The notation we used above is the "explicit argument" variation we recommend for readability. What a lot of `dplyr`

users do not seem to know is: base-`R`

already has this functionality. The function is called `transform()`

.

To demonstrate this, let’s first detach `dplyr`

to show that we are not using functions from `dplyr`

.

`detach("package:dplyr", unload = TRUE)`

Now let’s write the equivalent pipeline using exclusively base-`R`

.

```
iris ->.
transform(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) ->.
head(.)
```

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1 5.1 3.5 1.4 0.2 setosa 0.2199115
## 2 4.9 3.0 1.4 0.2 setosa 0.2199115
## 3 4.7 3.2 1.3 0.2 setosa 0.2042035
## 4 4.6 3.1 1.5 0.2 setosa 0.2356194
## 5 5.0 3.6 1.4 0.2 setosa 0.2199115
## 6 5.4 3.9 1.7 0.4 setosa 0.5340708
```

The "`->.`

" notation is the end-of-line variation of the Bizarro Pipe. The `transform()`

function has been part of `R`

since 1998. `dplyr::mutate()`

was introduced in 2014.

```
git log --all -p --reverse --source -S 'transform <-'
commit 41c2f7338c45dbf9eac99c210206bc3657bca98a refs/remotes/origin/tags/R-0-62-4
Author: pd <pd@00db46b3-68df-0310-9c12-caf00c1e9a41>
Date: Wed Feb 11 18:31:12 1998 +0000
Added the frametools functions subset() and transform()
git-svn-id: https://svn.r-project.org/R/trunk@709 00db46b3-68df-0310-9c12-caf00c1e9a41
```

]]>`R`

or `dplyr`

work is taking what you consider to be a too long (seconds instead of instant, or minutes instead of seconds, or hours instead of minutes, or a day instead of an hour) then try `data.table`

.
For some tasks `data.table`

is routinely faster than alternatives at pretty much all scales (example timings here).

If your project is large (millions of rows, hundreds of columns) you really should rent an an Amazon EC2 r4.8xlarge (244 GiB RAM) machine for an hour for about $2.13 (quick setup instructions here) and experience speed at scale.

]]>`R`

tip : how to pass a `formula`

to `lm()`

.
Often when modeling in `R`

one wants to build up a formula outside of the modeling call. This allows the set of columns being used to be passed around as a vector of strings, and treated as data. Being able to treat controls (such as the set of variables to use) as manipulable values allows for very powerful automated modeling methods.

What we are talking about is the ability to take the outcome (or dependent variable) and modeling variables (or independent variables) from somewhere else, as data. The kind of code we are talking about is shown below.

```
```# specifications of how to model,
# coming from somewhere else
outcome <- "mpg"
variables <- c("cyl", "disp", "hp", "carb")
# our modeling effort,
# fully parameterized!
f <- as.formula(
paste(outcome,
paste(variables, collapse = " + "),
sep = " ~ "))
print(f)
# mpg ~ cyl + disp + hp + carb
model <- lm(f, data = mtcars)
print(model)
# Call:
# lm(formula = f, data = mtcars)
#
# Coefficients:
# (Intercept) cyl disp hp carb
# 34.021595 -1.048523 -0.026906 0.009349 -0.926863

This works, and the `paste()`

pattern is so useful we suggest researching and memorizing it.

However the “call” portion of the model is reported as “`formula = f`

” (the name of the variable carrying the formula) instead of something more detailed. Frankly this printing issue never bothered us. None of our tools or workflows currently use the model `call`

item, and for a very large number of variables formatting the call contents in the model report becomes unweildy. We also already have the formula in a variable, so if we need it we can save it or pass it along.

There is a much better place on many models to get model structure information from than the model `call`

item: the model `terms`

item. This item carries a lot of information and formats up quite nicely:

```
```format(terms(model))
# [1] "mpg ~ cyl + disp + hp + carb"

Notice we used accessor notation (`terms(model)`

) to get the information. List notation, such as `model$terms`

also works.

In addition, as is so often the case in `R`

, there is already a known solution to the above problem. For common `R`

issues one should suspect there is a good available `R`

solution. It is just a matter of finding the right reference or teaching. For example: to control the `model$call`

item use the `bquote()`

facility, as we show below.

```
```outcome <- "mpg"
variables <- c("cyl", "disp", "hp", "carb")
f <- as.formula(
paste(outcome,
paste(variables, collapse = " + "),
sep = " ~ "))
print(f)
# mpg ~ cyl + disp + hp + carb
# The new line of code
model <- eval(bquote( lm(.(f), data = mtcars) ))
print(model)
# Call:
# lm(formula = mpg ~ cyl + disp + hp + carb, data = mtcars)
#
# Coefficients:
# (Intercept) cyl disp hp carb
# 34.021595 -1.048523 -0.026906 0.009349 -0.926863

`base::bquote()`

is a very sensible implementation of quasi-quotation or the Lisp backquote facility. The idea is everything inside the `bquote()`

is “quoted” (held unevaluated as an `R`

-language tree, not as mere strings!), with the exception of anything marked with the “`.()`

” notation. Anything marked with `.()`

is not quoted, but substituted in by value. This is why we see the contents of our formula, and not the name of the variable we used to denote it. `base::eval()`

is finally used to execute the combined contents.

`base::bquote()`

has some deliberate limits (unwillingness to substitute into left-hand-sides of `=`

-expressions, and some complexity of notation), which is why we promote `wrapr::let()`

for name for name replacement tasks (`wrapr::let()`

is for substituting a *fixed* number of symbols and combines the `eval(bquote())`

pattern into a single function).

In conclusion: the exact *saved* call-text in a model object may not be important, as a better structured record of the model specification is found in the model `terms`

item. However, you can also control the model call text by evaluating the model using the `eval()`

/`bquote()`

/`.()`

pattern we demonstrated above.

`R`

tip is: put your values in columns.

Some `R`

users use different seemingly clever tricks to bring data to an analysis.

Here is an (artificial) example.

chamber_sizes <- mtcars$disp/mtcars$cyl form <- hp ~ chamber_sizes model <- lm(form, data = mtcars) print(model) # Call: # lm(formula = form, data = mtcars) # # Coefficients: # (Intercept) chamber_sizes # 2.937 4.104

Notice: one of the variables came from a vector in the environment, not from the primary `data.frame`

. `chamber_sizes`

was first looked for in the `data.frame`

, and then in the environment the `formula`

was defined (which happens to be the global environment), and (if that hadn’t worked) in the executing environment (which is again the global environment).

Our advice is: do not do that. Place all of your values in columns. Make it unambiguous all variables are names of columns in your `data.frame`

of interest. This allows you to write simple code that works over explicit data. The style we recommend looks like the following.

mtcars$chamber_sizes <- mtcars$disp/mtcars$cyl form <- hp ~ chamber_sizes model <- lm(form, data = mtcars) print(model) # Call: # lm(formula = form, data = mtcars) # # Coefficients: # (Intercept) chamber_sizes # 2.937 4.104

The only difference is we took the time to place the derived vector into the data frame we are working with (assigned to `mtcars$chamber_sizes`

instead of the global environment in the first line). This is a very organized way to work, and as you see it does not take much effort.

Or use only existing values, as we show below.

form <- hp ~ I(disp/cyl) model <- lm(form, data = mtcars) print(model) # Call: # lm(formula = form, data = mtcars) # # Coefficients: # (Intercept) I(disp/cyl) # 2.937 4.104

This is something we teach: with some care you can reliably treat variables as strings, and this is in no way inferior to complex systems such as `stats::formula`

or `rlang::quosure`

. The fact that these objects cary around an environment in addition the names is in fact a barrier to reliable code, not an unmitigated advantage.

I am not alone in this opinion.

If the formula was typed in by the user interactively, then the call came from the global environment, meaning that variables not found in the data frame, or all variables if the

`data`

argument was missing, will be looked up in the same way they would in ordinary evaluation. But if the formula object was precomputed somewhere else, then its environment is the environment of the function call that created it. That means that arguments to that call and local assignments in that call will define variables for use in the model parent (that is, enclosing) environment of the call, which may be a package namespace. These rules are standard for`R`

, at least once one knows that an environment attribute has been assigned to the formula. They are similar to the use of closures described in Section 5.4, page 126.Where clear and trustworthy software is a priority, I would personally avoid such tricks. Ideally, all the variables in the model frame should come from an explicit, verifiable data source, typically a data frame object that is archived for future inspection (or equivalently, some other equally well-defined source of data, either inside or outside

`R`

, that is used explicitly to construct the data for the model).

Software for Data Analysis(Springer 2008), John M. Chambers, Chapter 6, section 9, page 221.

Chambers’ critique applies equally to `stats::formula`

or `rlang::quosure`

, and roughly he is calling over-use an anti-pattern.

This is why we say from the user point of view variables can be treated as mere names or strings. With some care you can ensure all your values are coming from a single `data.frame`

. And if that is the case, variables are column names.

Going to extra effort to carry around bound variables (variable names, plus an environment resolving the name to a value) is silly and a big source of reference leaks. Roughly: if you don’t know the value of a variable then pass it as a name or string (as that is all an unbound variable or symbol is), if you do know the value then use that value (the variable is serving little purpose at that point). Being able to replace variables with values is the hallmark of referential transparency, which is the family of expressions that are well-behaved in the sense that replacing the expressions with their referred to values does not change observable program behavior. There is code that breaks when you replace variables with values, but that should be considered to be a limitation of such code (not a merit).

]]>