We recently saw a great recurring R question: “how do you use one column to choose a different value for each row?” That is: how do you use a column as an index? Please read on for some idiomatic base R, data.table, and dplyr solutions.

## Dot-Pipe Paper Accepted by the R Journal!!!

We are thrilled to announce our (my and Nina Zumel’s) paper on the dot-pipe has been accepted by the R-Journal!

Continue reading Dot-Pipe Paper Accepted by the R Journal!!!

## Parameterizing with bquote

One thing that is sure to get lost in my *long* note on macros in `R`

is just how concise and powerful macros are. The problem is macros are concise, but they do a lot for you. So you get bogged down when you explain the joke.

Let’s try to be concise.

## On “Competition” in the R Ecosystem

I’ve been thinking a bit on “competition” in the `R`

ecosystem.

## Better R Code with wrapr Dot Arrow

Our `R`

package `wrapr`

supplies a "piping operator" that we feel is a real improvement in R code piped-style coding.

The idea is: with `wrapr`

‘s "dot arrow" pipe "`%.>%`

" the expression "`A %.>% B`

" is treated very much like "`{. <- A; B}`

". In particular this lets users think of "`A %.>% B(.)`

" as a left-to-right way to write "`B(A)`

" (i.e. under the convention of writing-out the dot arguments, the pipe looks a bit like left to right function composition, call this explicit dot notation).

This sort of notation becomes useful when we compose many steps. Some consider "`A %.>% B(.) %.>% C(.) %.>% D(.)`

" to be easier to read and easier to maintain than "`D(C(B(A)))`

".

## Announcing wrapr 1.6.2

`wrapr`

`1.6.2`

is now up on CRAN. We have some neat new features for `R`

users to try (in addition to many earlier `wrapr`

goodies).

^{2}

## Practical Data Science with R^{2}

The secret is out: Nina Zumel and I are busy working on *Practical Data Science with R ^{2}*, the second edition of our best selling book on learning data science using the R language.

Our publisher, Manning, has a great slide deck describing the book (and a discount code!!!) here:

We also just got back our part-1 technical review for the new book. Here is a quote from the technical review we are particularly proud of:

The dot notation for base

`R`

and the`dplyr`

package did make me stand up and think. Certain things suddenly made sense.

## A Quick Appreciation of the R transform Function

`R`

users who also use the `dplyr`

package will be able to quickly understand the following code that adds an estimated area column to a `data.frame`

.

```
suppressPackageStartupMessages(library("dplyr"))
iris %>%
mutate(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) %>%
head(.)
```

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1 5.1 3.5 1.4 0.2 setosa 0.2199115
## 2 4.9 3.0 1.4 0.2 setosa 0.2199115
## 3 4.7 3.2 1.3 0.2 setosa 0.2042035
## 4 4.6 3.1 1.5 0.2 setosa 0.2356194
## 5 5.0 3.6 1.4 0.2 setosa 0.2199115
## 6 5.4 3.9 1.7 0.4 setosa 0.5340708
```

The notation we used above is the "explicit argument" variation we recommend for readability. What a lot of `dplyr`

users do not seem to know is: base-`R`

already has this functionality. The function is called `transform()`

.

To demonstrate this, let’s first detach `dplyr`

to show that we are not using functions from `dplyr`

.

`detach("package:dplyr", unload = TRUE)`

Now let’s write the equivalent pipeline using exclusively base-`R`

.

```
iris ->.
transform(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) ->.
head(.)
```

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1 5.1 3.5 1.4 0.2 setosa 0.2199115
## 2 4.9 3.0 1.4 0.2 setosa 0.2199115
## 3 4.7 3.2 1.3 0.2 setosa 0.2042035
## 4 4.6 3.1 1.5 0.2 setosa 0.2356194
## 5 5.0 3.6 1.4 0.2 setosa 0.2199115
## 6 5.4 3.9 1.7 0.4 setosa 0.5340708
```

The "`->.`

" notation is the end-of-line variation of the Bizarro Pipe. The `transform()`

function has been part of `R`

since 1998. `dplyr::mutate()`

was introduced in 2014.

```
git log --all -p --reverse --source -S 'transform <-'
commit 41c2f7338c45dbf9eac99c210206bc3657bca98a refs/remotes/origin/tags/R-0-62-4
Author: pd <pd@00db46b3-68df-0310-9c12-caf00c1e9a41>
Date: Wed Feb 11 18:31:12 1998 +0000
Added the frametools functions subset() and transform()
git-svn-id: https://svn.r-project.org/R/trunk@709 00db46b3-68df-0310-9c12-caf00c1e9a41
```

## R Tip: Give data.table a Try

If your `R`

or `dplyr`

work is taking what you consider to be a too long (seconds instead of instant, or minutes instead of seconds, or hours instead of minutes, or a day instead of an hour) then try `data.table`

.

For some tasks `data.table`

is routinely faster than alternatives at pretty much all scales (example timings here).

If your project is large (millions of rows, hundreds of columns) you really should rent an an Amazon EC2 r4.8xlarge (244 GiB RAM) machine for an hour for about $2.13 (quick setup instructions here) and experience speed at scale.

## R Tip: How to Pass a formula to lm

`R`

tip : how to pass a `formula`

to `lm()`

.

Often when modeling in `R`

one wants to build up a formula outside of the modeling call. This allows the set of columns being used to be passed around as a vector of strings, and treated as data. Being able to treat controls (such as the set of variables to use) as manipulable values allows for very powerful automated modeling methods.