Posted on Categories Programming, TutorialsTags 2 Comments on A Quick Appreciation of the R transform Function

## A Quick Appreciation of the R transform Function

`R` users who also use the `dplyr` package will be able to quickly understand the following code that adds an estimated area column to a `data.frame`.

``````suppressPackageStartupMessages(library("dplyr"))

iris %>%
mutate(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) %>%
``````##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1          5.1         3.5          1.4         0.2  setosa  0.2199115
## 2          4.9         3.0          1.4         0.2  setosa  0.2199115
## 3          4.7         3.2          1.3         0.2  setosa  0.2042035
## 4          4.6         3.1          1.5         0.2  setosa  0.2356194
## 5          5.0         3.6          1.4         0.2  setosa  0.2199115
## 6          5.4         3.9          1.7         0.4  setosa  0.5340708``````

The notation we used above is the "explicit argument" variation we recommend for readability. What a lot of `dplyr` users do not seem to know is: base-`R` already has this functionality. The function is called `transform()`.

To demonstrate this, let’s first detach `dplyr` to show that we are not using functions from `dplyr`.

``detach("package:dplyr", unload = TRUE)``

Now let’s write the equivalent pipeline using exclusively base-`R`.

``````iris ->.
transform(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) ->.
``````##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1          5.1         3.5          1.4         0.2  setosa  0.2199115
## 2          4.9         3.0          1.4         0.2  setosa  0.2199115
## 3          4.7         3.2          1.3         0.2  setosa  0.2042035
## 4          4.6         3.1          1.5         0.2  setosa  0.2356194
## 5          5.0         3.6          1.4         0.2  setosa  0.2199115
## 6          5.4         3.9          1.7         0.4  setosa  0.5340708``````

The "`->.`" notation is the end-of-line variation of the Bizarro Pipe. The `transform()` function has been part of `R` since 1998. `dplyr::mutate()` was introduced in 2014.

``````git log --all -p --reverse --source -S 'transform <-'

commit 41c2f7338c45dbf9eac99c210206bc3657bca98a refs/remotes/origin/tags/R-0-62-4
Author: pd <pd@00db46b3-68df-0310-9c12-caf00c1e9a41>
Date:   Wed Feb 11 18:31:12 1998 +0000

Added the frametools functions subset() and transform()

git-svn-id: https://svn.r-project.org/R/trunk@709 00db46b3-68df-0310-9c12-caf00c1e9a41``````
Posted on 7 Comments on R Tip: Give data.table a Try

## R Tip: Give data.table a Try

If your `R` or `dplyr` work is taking what you consider to be a too long (seconds instead of instant, or minutes instead of seconds, or hours instead of minutes, or a day instead of an hour) then try `data.table`.

For some tasks `data.table` is routinely faster than alternatives at pretty much all scales (example timings here).

If your project is large (millions of rows, hundreds of columns) you really should rent an an Amazon EC2 r4.8xlarge (244 GiB RAM) machine for an hour for about \$2.13 (quick setup instructions here) and experience speed at scale.

Posted on Categories Programming, TutorialsTags , , , , 4 Comments on R Tip: How to Pass a formula to lm

## R Tip: How to Pass a formula to lm

`R` tip : how to pass a `formula` to `lm()`.

Often when modeling in `R` one wants to build up a formula outside of the modeling call. This allows the set of columns being used to be passed around as a vector of strings, and treated as data. Being able to treat controls (such as the set of variables to use) as manipulable values allows for very powerful automated modeling methods.