We are thrilled to announce our (my and Nina Zumel’s) paper on the dot-pipe has been accepted by the R-Journal!

Continue reading Dot-Pipe Paper Accepted by the R Journal!!!

Skip to content
# Month: September 2018

Posted on Categories Administrativia, Exciting Techniques, Programming2 Comments on Dot-Pipe Paper Accepted by the R Journal!!!## Dot-Pipe Paper Accepted by the R Journal!!!

Posted on Categories Opinion, Programming, Tutorials1 Comment on Parameterizing with bquote## Parameterizing with bquote

Posted on Categories Opinion4 Comments on On “Competition” in the R Ecosystem## On “Competition” in the R Ecosystem

Posted on Categories Opinion, Programming2 Comments on Better R Code with wrapr Dot Arrow## Better R Code with wrapr Dot Arrow

Posted on Categories Programming, Tutorials5 Comments on Announcing wrapr 1.6.2## Announcing wrapr 1.6.2

Posted on Categories Opinion, Practical Data Science, Statistics2 Comments on Practical Data Science with R^{2}## Practical Data Science with R^{2}

Posted on Categories Programming, Tutorials2 Comments on A Quick Appreciation of the R transform Function## A Quick Appreciation of the R transform Function

Posted on Categories data science, Opinion, Practical Data Science, Pragmatic Data Science, Tutorials7 Comments on R Tip: Give data.table a Try## R Tip: Give data.table a Try

Posted on Categories Programming, Tutorials4 Comments on R Tip: How to Pass a formula to lm## R Tip: How to Pass a formula to lm

We are thrilled to announce our (my and Nina Zumel’s) paper on the dot-pipe has been accepted by the R-Journal!

Continue reading Dot-Pipe Paper Accepted by the R Journal!!!

One thing that is sure to get lost in my *long* note on macros in `R`

is just how concise and powerful macros are. The problem is macros are concise, but they do a lot for you. So you get bogged down when you explain the joke.

Let’s try to be concise.

I’ve been thinking a bit on “competition” in the `R`

ecosystem.

Our `R`

package `wrapr`

supplies a "piping operator" that we feel is a real improvement in R code piped-style coding.

The idea is: with `wrapr`

‘s "dot arrow" pipe "`%.>%`

" the expression "`A %.>% B`

" is treated very much like "`{. <- A; B}`

". In particular this lets users think of "`A %.>% B(.)`

" as a left-to-right way to write "`B(A)`

" (i.e. under the convention of writing-out the dot arguments, the pipe looks a bit like left to right function composition, call this explicit dot notation).

This sort of notation becomes useful when we compose many steps. Some consider "`A %.>% B(.) %.>% C(.) %.>% D(.)`

" to be easier to read and easier to maintain than "`D(C(B(A)))`

".

`wrapr`

`1.6.2`

is now up on CRAN. We have some neat new features for `R`

users to try (in addition to many earlier `wrapr`

goodies).

The secret is out: Nina Zumel and I are busy working on *Practical Data Science with R ^{2}*, the second edition of our best selling book on learning data science using the R language.

Our publisher, Manning, has a great slide deck describing the book (and a discount code!!!) here:

We also just got back our part-1 technical review for the new book. Here is a quote from the technical review we are particularly proud of:

The dot notation for base

`R`

and the`dplyr`

package did make me stand up and think. Certain things suddenly made sense.

`R`

users who also use the `dplyr`

package will be able to quickly understand the following code that adds an estimated area column to a `data.frame`

.

```
suppressPackageStartupMessages(library("dplyr"))
iris %>%
mutate(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) %>%
head(.)
```

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1 5.1 3.5 1.4 0.2 setosa 0.2199115
## 2 4.9 3.0 1.4 0.2 setosa 0.2199115
## 3 4.7 3.2 1.3 0.2 setosa 0.2042035
## 4 4.6 3.1 1.5 0.2 setosa 0.2356194
## 5 5.0 3.6 1.4 0.2 setosa 0.2199115
## 6 5.4 3.9 1.7 0.4 setosa 0.5340708
```

The notation we used above is the "explicit argument" variation we recommend for readability. What a lot of `dplyr`

users do not seem to know is: base-`R`

already has this functionality. The function is called `transform()`

.

To demonstrate this, let’s first detach `dplyr`

to show that we are not using functions from `dplyr`

.

`detach("package:dplyr", unload = TRUE)`

Now let’s write the equivalent pipeline using exclusively base-`R`

.

```
iris ->.
transform(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) ->.
head(.)
```

```
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1 5.1 3.5 1.4 0.2 setosa 0.2199115
## 2 4.9 3.0 1.4 0.2 setosa 0.2199115
## 3 4.7 3.2 1.3 0.2 setosa 0.2042035
## 4 4.6 3.1 1.5 0.2 setosa 0.2356194
## 5 5.0 3.6 1.4 0.2 setosa 0.2199115
## 6 5.4 3.9 1.7 0.4 setosa 0.5340708
```

The "`->.`

" notation is the end-of-line variation of the Bizarro Pipe. The `transform()`

function has been part of `R`

since 1998. `dplyr::mutate()`

was introduced in 2014.

```
git log --all -p --reverse --source -S 'transform <-'
commit 41c2f7338c45dbf9eac99c210206bc3657bca98a refs/remotes/origin/tags/R-0-62-4
Author: pd <pd@00db46b3-68df-0310-9c12-caf00c1e9a41>
Date: Wed Feb 11 18:31:12 1998 +0000
Added the frametools functions subset() and transform()
git-svn-id: https://svn.r-project.org/R/trunk@709 00db46b3-68df-0310-9c12-caf00c1e9a41
```

If your `R`

or `dplyr`

work is taking what you consider to be a too long (seconds instead of instant, or minutes instead of seconds, or hours instead of minutes, or a day instead of an hour) then try `data.table`

.

For some tasks `data.table`

is routinely faster than alternatives at pretty much all scales (example timings here).

If your project is large (millions of rows, hundreds of columns) you really should rent an an Amazon EC2 r4.8xlarge (244 GiB RAM) machine for an hour for about $2.13 (quick setup instructions here) and experience speed at scale.

`R`

tip : how to pass a `formula`

to `lm()`

.

Often when modeling in `R`

one wants to build up a formula outside of the modeling call. This allows the set of columns being used to be passed around as a vector of strings, and treated as data. Being able to treat controls (such as the set of variables to use) as manipulable values allows for very powerful automated modeling methods.