Posted on Categories Coding, Opinion, Statistics

# wrapr: for sweet R code

This article is on writing sweet `R` code using the `wrapr` package. ## The problem

Consider the following `R` puzzle. You are given: a `data.frame`, the name of a column that you wish to find missing values (`NA`) in, and the name of a column to land the result. For instance:

```d <- data.frame(x = c(1, NA))
print(d)

#    x
# 1  1
# 2 NA

cname <- 'x'
print(cname)

#  "x"

rname <- paste(cname, 'isNA', sep = '_')
print(rname)

#  "x_isNA"
```

How do you write generic code to populate the column `x_isNA` with which rows of `x` are missing?

### The “base R” solution

In “base `R`” (R without additional packages) this is easy.

When you know the column names while writing the code:

```d2 <- d
d2\$x_isNA <- is.na(d2\$x)

print(d2)

#    x x_isNA
# 1  1  FALSE
# 2 NA   TRUE
```

And when you don’t know the column names while writing the code (but know they will arrive in variables later):

```d2 <- d
d2[[rname]] <- is.na(d2[[cname]])
```

The “base R” solution really is quite elegant.

### The “all in” non-standard evaluation `dplyr::mutate` solution

As far as I can tell the “all in” non-standard evaluation `dplyr::mutate` solution is something like the following.

When you know the column names while writing the code:

```library("dplyr")
d %>% mutate(x_isNA = is.na(x))
```

And when you don’t know the column names while writing the code (but know they will arrive in variables later):

```d %>%
mutate_(.dots =
stats::setNames(list(lazyeval::interp(
~ is.na(VAR),
VAR = as.name(cname)
)),
rname))
```

### The sweet `wrapr::let``dplyr::mutate` solution

We will only work the harder “when you don’t yet know the column name” (or parametric) version:

```library("wrapr")
let(list(COL = cname, RES = rname),
d %>% mutate(RES = is.na(COL))
)
```

I think that this is pretty sweet, and can really level up your `dplyr` game.

`wrapr::let` is available from `CRAN` and already has a number of satisfied users: If function behavior depends on variable names, then convenient control of functions is eventually going to require convenient control of variable names; so needing to re-map variable names at some point is inevitable.

## 7 thoughts on “wrapr: for sweet R code”

1. Another cool solution would be what I am calling a “view frame.” That is: a reference style object that looks to `R` like a `data.frame` (or any class that claims to extend it such as `tbl`) but re-maps column names to another referred to `data.frame`.

I am not a regular `data.table` user, but this seems like something that package may already (or could easily) supply.

2. Any `R` function or package that relies heavily on non-standard evaluation can benefit from parametric notation (such as introduced by `wrapr::let`). It isn’t just coding around things, but creating new capabilities (that are ready to be wrapped as re-usable functions). The more the system relies on non-standard evaluation, the larger the potential benefit (which is how I have been picking examples).

For example:

```library("wrapr")

angle <- 1:10
var <- 'angle'
fn <- 'sin'

let(c(X=var, F=fn),
plot(X, F(X))
)
``` `wrap::let` can also be used with knitr markdown which looks like the following:

```---
params:
FN: sin
---
```{r}
library("wrapr")
let(
alias=restrictToNameAssignments(params),
expr={
# blocks can be arbitrarily long
x <- 0.1*(1:20)
plot(FN(x))
})
```
```

The connection is: parameterized `knitr` converts the `yaml` header into the data structure `params`, which is already in the correct format for `wrapr::let` (the `restrictToNameAssignments()` call is just demonstrating the additional capability of filtering out non-name assignments, and is not strictly necessary).

I also discuss parametric markdown in the following screencast:

A lot of the power of `R` is being able to script and program over data and standard evaluation functions; being able to conveniently script and program over non-standard evaluation adds even more power.

Nice! You can also use the standard evaluation version of mutate:

mutate_(d, .dots = setNames(list(is.na(cname)), rname))

1. That would be nice, but it does not work. I think what that is calculating is if the variable `cname` is a missing value or not (and not calculating facts about the `data.frame` column):

```library("dplyr")
d <- data.frame(x = c(1, NA))
cname <- 'x'
rname <- paste(cname, 'isNA', sep = '_')
mutate_(d, .dots = setNames(list(is.na(cname)),
rname))

x x_isNA
1  1  FALSE
2 NA  FALSE
```

It is interesting the `list()` delays execution (which was the latest improvement I learned about), but a few more tricks are needed to get the correct outcome (which is what was pointed out to me here).

The article is already using the standard eval path, it is just so buried in the adaptions that it is hard to see the underbar.

I have heard a few times (1, 2, 3) that big changes are coming to `lazyeval` and/or the standard interface paths in `dplyr`, but frankly that is just another reason to not waste time mastering the minutia of the current `dplyr` standard interface.

Also if WordPress mangled out some important part of your solution, I do apologize (WordPress does not like code in comments very much).

4. Aaron says:

I like the idea of “let” bindings in R, but I will point out that there is a much easier way to apply functions across columns in dplyr: use “mutate_each”.

```library(dplyr)
d %>%
mutate_each(funs("isNA" = is.na(.)))
```
1. Aaron,

Thanks for the comment. And you are right I should have mentioned `mutate_each` (it is a great tool).

`mutate_each` and `summarize_each` are indeed powerful. Though remember you avoided part of the problem when you typed in the name of the result column (you did not take it from the `rname` variable).

The main reason they work nicely is we can (in this case) parametrize over the primary non-standard evaluation path by the use of `funs()`, “`.`“, and `one_of()`. I.e., we were not forced to use `mutate_each_()` to parameterize.

It would be a bit of a challenge (involving either `mutate_each_()` or `funs_()`) to reproduce the following exactly without typing in column names (the presence of the extra column plus the non-conventional naming of the result are a bit hard to push into the `mutate_each` form).

```library("dplyr")
library("wrapr")
d <- data.frame(x= c(1, NA), y= c(2, 3))
cname <- 'x'
rname <- 'xcalc'
let(c(RES=rname, CNAME=cname),
d %>% mutate(RES= is.na(CNAME))
)

# x y xcalc
# 1  1 2 FALSE
# 2 NA 3  TRUE
```

The `wrapr::let` solution is generic: small changes in the problem did not require significant changes in the code.

For example (as I am sure you know) writing the following is not enough:

```
d %>% mutate_each(funs(rname= is.na(.)),
one_of(cname))
```

And again the above is only pleasant to parameterize over as `dplyr::one_of()` uses standard evaluation (takes “variables in character vector”).

The `is.na()` calculation is only meant as a simple notional example. I am not trying to say I don’t know how to find `NA` values easily or use `complete.cases`. We give computing over many columns as an example- as it is an example where people are willing to accept you can’t hard-code the column name. But there are many other examples (just not always as succinctly accepted) where you are supplying a service or function and you need to calculate over and land one or more columns that you do not know the exact names of when you are writing the code.

What I want to demonstrate is `wrapr::let` is an easy way to program over non-standard interfaces. It is so important what the source of the non-standard interface example is, but more so that they are all easy to re-wrap.

I know these long response make me look a bit like a bully. And I do appreciate your input and apologize for writing so long. I guess an additional point I would like to make (I know more length!) is: non-standard evaluation interfaces are more of a burden than `R` users seem to appreciate (and `dplyr` is probably better engineered than most people appreciate to mitigate so many of the negative consequences).