wrapr 1.4.1
is now available on CRAN. wrapr
is a really neat R
package both organizing, meta-programming, and debugging R code. This update generalizes the dot-pipe feature’s dot S3 features.
Please give it a try!
wrapr
, is an R
package that supplies powerful tools for writing and debugging R
code.
Primary wrapr
services include:
let()
(let block)%.>%
(dot arrow pipe)build_frame()
/draw_frame()
qc()
(quoting concatenate):=
(named map builder)DebugFnW()
(function debug wrappers)λ()
(anonymous function builder)let()
let()
allows execution of arbitrary code with substituted variable names (note this is subtly different than binding values for names as with base::substitute()
or base::with()
).
The function is simple and powerful. It treats strings as variable names and re-writes expressions as if you had used the denoted variables. For example the following block of code is equivalent to having written "a + a".
library("wrapr")
a <- 7
let(
c(VAR = 'a'),
VAR + VAR
)
# [1] 14
This is useful in re-adapting non-standard evaluation interfaces (NSE interfaces) so one can script or program over them.
We are trying to make let()
self teaching and self documenting (to the extent that makes sense). For example try the arguments "eval=FALSE
" prevent execution and see what would have been executed, or debug=TRUE
to have the replaced code printed in addition to being executed:
let(
c(VAR = 'a'),
eval = FALSE,
{
VAR + VAR
}
)
# {
# a + a
# }
let(
c(VAR = 'a'),
debugPrint = TRUE,
{
VAR + VAR
}
)
# $VAR
# [1] "a"
#
# {
# a + a
# }
# [1] 14
Please see vignette('let', package='wrapr')
for more examples. Some formal documentation can be found here.
For working with dplyr
0.7.*
we strongly suggest wrapr::let()
(or even an alternate approach called seplyr
).
%.>%
(dot pipe or dot arrow)%.>%
dot arrow pipe is a pipe with intended semantics:
"
a %.>% b
" is to be treated approximately as if the user had written "{ . <- a; b };
" with "%.>%
" being treated as left-associative.
Other R
pipes include magrittr
and pipeR
.
The following two expressions should be equivalent:
cos(exp(sin(4)))
# [1] 0.8919465
4 %.>% sin(.) %.>% exp(.) %.>% cos(.)
# [1] 0.8919465
The notation is quite powerful as it treats pipe stages as expression parameterized over the variable ".
". This means you do not need to introduce functions to express stages. The following is a valid dot-pipe:
1:4 %.>% .^2
# [1] 1 4 9 16
The notation is also very regular as we show below.
1:4 %.>% sin
# [1] 0.8414710 0.9092974 0.1411200 -0.7568025
1:4 %.>% sin(.)
# [1] 0.8414710 0.9092974 0.1411200 -0.7568025
1:4 %.>% base::sin
# [1] 0.8414710 0.9092974 0.1411200 -0.7568025
1:4 %.>% base::sin(.)
# [1] 0.8414710 0.9092974 0.1411200 -0.7568025
1:4 %.>% function(x) { x + 1 }
# [1] 2 3 4 5
1:4 %.>% (function(x) { x + 1 })
# [1] 2 3 4 5
1:4 %.>% { .^2 }
# [1] 1 4 9 16
1:4 %.>% ( .^2 )
# [1] 1 4 9 16
Regularity can be a big advantage in teaching and comprehension. Please see "In Praise of Syntactic Sugar" for more details. Some formal documentation can be found here.
5 %.>% 6
deliberately stops as 6
is a right-hand side that obviously does not use its incoming value. This check is only applied to values, not functions on the right-hand side.
sin()
is prohibited as it looks too much like the user declaring sin()
takes no arguments. One must pipe into either a function, function name, or an non-trivial expression (such as sin(.)
). A useful error message is returned to the user: wrapr::pipe does not allow direct piping into a no-argument function call expression (such as "sin()" please use sin(.))
.
5 %.>% return(.)
is prohibited as the obvious pipe implementation would not actually escape from user functions as users may intend.
$
, ::
, @
, and a few more) on the right-hand side are treated performed (example: 5 %.>% base::sin(.)
).
5 %.>% (sin(.))
).
5 %.>% function(x) {x+1}
returns 6, just as 5 %.>% (function(x) {x+1})(.)
does).
5 %.>% { function(x) {x+1} }
returns function(x) {x+1}
, not 6).
build_frame()
/draw_frame()
build_frame()
is a convenient way to type in a small example data.frame
in natural row order. This can be very legible and saves having to perform a transpose in one’s head. draw_frame()
is the complimentary function that formats a given data.frame
(and is a great way to produce neatened examples).
x <- build_frame(
"measure" , "training", "validation" |
"minus binary cross entropy", 5 , -7 |
"accuracy" , 0.8 , 0.6 )
print(x)
# measure training validation
# 1 minus binary cross entropy 5.0 -7.0
# 2 accuracy 0.8 0.6
str(x)
# 'data.frame': 2 obs. of 3 variables:
# $ measure : chr "minus binary cross entropy" "accuracy"
# $ training : num 5 0.8
# $ validation: num -7 0.6
cat(draw_frame(x))
# build_frame(
# "measure" , "training", "validation" |
# "minus binary cross entropy", 5 , -7 |
# "accuracy" , 0.8 , 0.6 )
qc()
(quoting concatenate)qc()
is a quoting variation on R
‘s concatenate operator c()
. This code such as the following:
qc(a = x, b = y)
# a b
# "x" "y"
qc(one, two, three)
# [1] "one" "two" "three"
:=
(named map builder):=
is the "named map builder". It allows code such as the following:
'a' := 'x'
# a
# "x"
The important property of named map builder is it accepts values on the left-hand side allowing the following:
name <- 'variableNameFromElsewhere'
name := 'newBinding'
# variableNameFromElsewhere
# "newBinding"
A nice property is :=
commutes (in the sense of algebra or category theory) with R
‘s concatenation function c()
. That is the following two statements are equivalent:
c('a', 'b') := c('x', 'y')
# a b
# "x" "y"
c('a' := 'x', 'b' := 'y')
# a b
# "x" "y"
The named map builder is designed to synergize with seplyr
.
DebugFnW()
DebugFnW()
wraps a function for debugging. If the function throws an exception the execution context (function arguments, function name, and more) is captured and stored for the user. The function call can then be reconstituted, inspected and even re-run with a step-debugger. Please see our free debugging video series and vignette('DebugFnW', package='wrapr')
for examples.
λ()
(anonymous function builder)λ()
is a concise abstract function creator or "lambda abstraction". It is a placeholder that allows the use of the -character for very concise function abstraction.
Example:
# Make sure lambda function builder is in our enironment.
wrapr::defineLambda()
# square numbers 1 through 4
sapply(1:4, λ(x, x^2))
# [1] 1 4 9 16
Install with either:
install.packages("wrapr")
or
# install.packages("devtools")
devtools::install_github("WinVector/wrapr")
More details on wrapr
capabilities can be found in the following two technical articles:
Note: wrapr
is meant only for "tame names", that is: variables and column names that are also valid simple (without quotes) R
variables names.
Nina has, as usual, some great documentation here.
More and more I am finding when you are in the middle of doing something, having a ready made plotting tool is less distracting than working directly designing a ggplot2
graph on the fly. In addition to WVPlots other great “ready to go” plot packages include ggpubr and ggstatsplot. I definitely recommend checking out all three packages and the packages/tools they use.
rquery
talk went very well, thank you very much to the attendees for being an attentive and generous audience.
(John teaching rquery
at BARUG, photo credit: Timothy Liu)
I am now looking for invitations to give a streamlined version of this talk privately to groups using R
who want to work with SQL
(with databases such as PostgreSQL or big data systems such as Apache Spark). rquery
has a number of features that greatly improve team productivity in this environment (strong separation of concerns, strong error checking, high usability, specific debugging features, and high performance queries).
If your group is in the San Francisco Bay Area and using R
to work with a SQL
accessible data source, please reach out to me at jmount@win-vector.com, I would be honored to show your team how to speed up their project and lower development costs with rquery
. If you are a big data vendor and some of your clients use R
, I am especially interested in getting in touch: our system can help R
users start working with your installation.
Preparing Datasets – The Ugly Truth & Some Solutions is a great idea of Jim Porzak’s. Jim will speak on problems one is likely to encounter in trying to use real world data for predictive modeling and then I will speak on how the vtreat
package helps address these issues. vtreat
systematizes a number of routine domain independent data repairs and preparations, leaving you more time to work on important domain specific issues (plus it has citable documentation, helping make your methodology section smaller).
vtreat
is the best way to prepare messy real world data for predictive modeling.
rquery: a Query Generator for Working With SQL Data
is an introduction to the rquery
query generator system. rquery
is a new R
package that builds “pipe-able SQL” and includes a number of very powerful data operators and analyses. It includes a number of very neat features, including query pipeline diagrams.
We think rquery
(plus cdata
) is going to be the best way (easiest to learn, most expressive, easiest to maintain, and most performant) method to use R
to manipulate data at scale (SQL databases and Spark).
R
tip: use slices.
R
has a very powerful array slicing ability that allows for some very slick data processing.
Suppose we have a data.frame
“d
“, and for every row where d$n_observations < 5
we wish to “NA
-out” some other columns (mark them as not yet reliably available). Using slicing techniques this can be done quite quickly as follows.
library("wrapr") d[d$n_observations < 5, qc(mean_cost, mean_revenue, mean_duration)] <- NA
(For “qc()
” please see R Tip: Use qc() For Fast Legible Quoting.)
The above notation is very convenient, compact, and powerful. We are adding this as operator to our rquery
query generator as assign_slice()
(and a related method for directly dealing with NA
/NULL
).
R
package cdata
now has version 0.7.0
available from CRAN
.
cdata
is a data manipulation package that subsumes many higher order data manipulation operations including pivot/un-pivot, spread/gather, or cast/melt. The record to record transforms are specified by drawing a table that expresses the record structure (called the “control table” and also the link between the key concepts of row-records and block-records).
What can be quickly specified and achieved using these concepts and notations is amazing and quite teachable. These transforms can be run in-memory or in remote database or big-data systems (such as Spark).
The concepts are taught in Nina Zumel’s excellent tutorial.
And in John Mount’s quick screencast/lecture.
The 0.7.0
update adds local versions of the operators in addition to the Spark and database implementations. These methods should now be a bit safer for in-memory complex/annotated types such as dates and times.
R
has a lot of under-appreciated super powerful functions. I list a few of our favorites below.
Atlas, carrying the sky. Royal Palace (Paleis op de Dam), Amsterdam.
stats::approx()
: approximate a curve/function.base::cumsum()
: cumulative ordered sum.stats::ecdf()
: estimate the cumulative distribution function.base::findInterval()
: assign values to bins.base::match()
: bulk computation of first match. Can lookup and sort data and even find non-duplicate data.base::Reduce()
: nifty functional method to combine multiple function evaluations.base::tapply()
: grouped summary function.base::unlist()
: build arrays of atomic values from more complicated nested structures.base::Vectorize()
: Convert scalar functions into functions ready to operate on arrays.We would love to hear about some of your favorites.
]]>match_order()
to Align Data]]>wrapr::match_order()
to align data.
Suppose we have data in two data frames, and both of these data frames have common row-identifying columns called “idx
“.
library("wrapr") d1 <- build_frame( "idx", "x" | 3 , "a" | 1 , "b" | 2 , "c" ) d2 <- build_frame( "idx", "y" | 2 , "D" | 1 , "E" | 3 , "F" ) print(d1) #> idx x #> 1 3 a #> 2 1 b #> 3 2 c print(d2) #> idx y #> 1 2 D #> 2 1 E #> 3 3 F
(Please see R Tip: Think in Terms of Values for build_frame()
and other value capturing tools.)
Often we wish to work with such data aligned so each row in d2
has the same idx
value as the same row (by row order) as d1
. This is an important data wrangling task, so there are many ways to achieve it in R, such as base::merge()
, dplyr::left_join()
, or by sorting both tables into the same order and then using base::cbind()
.
However if you wish to preserve the order of the first table (which may not be sorted), you need one more trick.
You can add a row-id column, sort by the joining id, combine and then re-sort by the row-id column.
Or you can match the orders in one step using wrapr::match_order()
.
p <- match_order(d2$idx, d1$idx) print(d2[p, , drop=FALSE]) #> idx y #> 3 3 F #> 2 1 E #> 1 2 D
match_order
is merely wrapping all of the sort and re-sort tricks we mentioned above, however the theory is based on the absolute magic of associative array indexing.
Please see R Tip: Use drop = FALSE
with data.frame
s, for why one should get in the habit of writing drop = FALSE
.
R
both using the magrittr
package and using the wrapr
package.
magrittr
pipelinesThe magittr
pipe glyph “%>%
” is the most popular piping symbol in R
.
magrittr
documentation describes %>%
as follow.
Basic piping:
x %>% f
is equivalent tof(x)
x %>% f(y)
is equivalent tof(x, y)
x %>% f %>% g %>% h
is equivalent toh(g(f(x)))
The argument placeholder
x %>% f(y, .)
is equivalent tof(y, x)
x %>% f(y, z = .)
is equivalent tof(y, z = x)
Re-using the placeholder for attributes
It is straight-forward to use the placeholder several times in a right-hand side expression. However, when the placeholder only appears in a nested expressions magrittr will still apply the first-argument rule. The reason is that in most cases this results more clean code.
x %>% f(y = nrow(.), z = ncol(.))
is equivalent tof(x, y = nrow(x), z = nrow(x))
The behavior can be overruled by enclosing the right-hand side in braces:
x %>% {f(y = nrow(.), z = ncol(.))}
is equivalent tof(y = nrow(x), z = nrow(x))
That is a bit of simplification, but is the taught mental model.
Grolemund, Wickham, R for Data Science, O’Reilly Media, 2017; “Pipes” describes the magrittr
pipe as follows.
foo_foo %>% hop(through = forest) %>% scoop(up = field_mouse) %>% bop(on = head)[…]
The pipe works by performing a “lexical transformation”: behind the scenes, magrittr reassembles the code in the pipe to a form that works by overwriting an intermediate object. When you run a pipe like the one above, magrittr does something like this:
my_pipe <- function(.) { . <- hop(., through = forest) . <- scoop(., up = field_mice) bop(., on = head) } my_pipe(foo_foo)
Roughly they are saying x %>% f(ARGS)
can be considered shorthand for { . <- x; f(., ARGS) }
where the evaluation in question happens in a temporary environment.
To safely and confidently use piping one must eventually know what all of the commonly used related notations mean. For example it is important to know what each of the following evaluate to:
5 %>% sin
: the notation demonstrated in the magrittr
excerpt.5 %>% sin()
: possibly the notation one would abstract from the R for Data Science excerpt.5 %>% sin(.)
: the notation we recomend (especially for the part time R
user).Also, there are questions of how one pipes into general expressions (instead of names, functions, or partially specified function evaluation signatures).
These may seem like details: but they are the steps required to move from copying code from examples and hoping it works (a state of learned helplessness, especially when simple variations fail) or having an effective (even if approximate) mental model for the operators one has decided to work with and plan over.
wrapr
pipelineswrapr
supplies its own piping glyph: “dot pipe” %.>%
. wrapr
’s goal is to supply an operator that is a regular and safe with a %.>% b
being approximately syntactic sugar for { . <- a; b }
(with, visible side-effects, i.e. we can actually see the “.
” assignment happen).
library("wrapr")
# calculate sin(5)
5 %.>% sin(.)
## [1] -0.9589243
# 5 left in dot, a visible side-effect
print(.)
## [1] 5
# clear dot, so no later failing example
# falsely appears to work
rm(list = ".")
We think wrapr
piping is very comprehensible (non-magic) expression oriented pipe with a few rules and additional admonitions:
5 %.>% sin(.)
and not 5 %.>% sin()
or 5 %.>% sin
. It good to make it obvious to the reader that “.
” is a free-name in the right-hand side expression, allowing the easy application of the convention of treating the right-hand side expression as an implicit function of “.
”.5 %.>% sin
and function application as in 5 %.>% function(x) { sin(x) }
.R
’s visibility controls).wrapr
convenience transforms and safety checking. This is compatible with the subtle R
convention that brace-blocks {}
are considered more opaque and not as eagerly looked into as parenthesized expressions (one such example can be found here).wrapr
is grammar in the sense some statements are deliberately not part of the accepted notation. Some of the “errors” in the next set of examples are in fact wrapr
refusing certain pipelines.wrapr
by using R
S3
methodology to specify their own rules for various classes (such as building pipable ggplot2
code). Technical details can be found here.Let’s consider the following attempts of writing piped variations of sin(5)
in both magritter
and wrapr
notations.
exprs = c(
"5 PIPE_GLYPH sin",
"5 PIPE_GLYPH sin()",
"5 PIPE_GLYPH sin(.)",
"5 PIPE_GLYPH base::sin",
"5 PIPE_GLYPH base::sin()",
"5 PIPE_GLYPH base::sin(.)",
"5 PIPE_GLYPH ( sin )",
"5 PIPE_GLYPH ( sin() )",
"5 PIPE_GLYPH ( sin(.) )",
"5 PIPE_GLYPH { sin }",
"5 PIPE_GLYPH { sin() }",
"5 PIPE_GLYPH { sin(.) }",
"5 PIPE_GLYPH function(x) { sin(x) }",
"5 PIPE_GLYPH ( function(x) { sin(x) } )",
"5 PIPE_GLYPH { function(x) { sin(x) } }",
"f <- function(x) { sin(x) }; 5 PIPE_GLYPH f"
)
The point is in a room full of students in a lab setting if you show them “5 %>% sin
” some of them are going to try variations or have variations from their work that are important to them. This possibly includes: package-qualifying the function name, wrapping expressions in parenthesis, altering arguments, building functions, and retrieving functions from data structures. The pipeline (for convenience) tries to lower the distinctions between expressions, functions, and function names. However the pipeline notation does not completely eliminate the differences.
A non-expert magrittr
/dplyr
user might expect all the pipe examples we are about to discuss to evaluate to sin(5)
= -0.9589243. As R
is routinely used by self-described non-programmers (such as scientists, analysts, and statisticians) the non-expert or part time R
user is a very important class of R
users (and in fact distinct from beginning R
users). So how a system meets or misses simplified expectations is quite important in R
.
To run our examples we will use a fairly involved function work_examples()
that takes the vector of examples and returns an annotated data.frame
of evaluation results. For completeness this code is given here, but can be safely skipped when reading this article.
Now we can work our examples, and return the comparison in tabular format.
work_examples(exprs, sin(5)) %.>%
knitr::kable(., format = "html", escape = FALSE) %.>%
column_spec(., 1:4, width = "1.75in") %.>%
kable_styling(., "striped", full_width = FALSE)
magrittr expr | magrittr res | wrapr expr | wrapr res |
---|---|---|---|
5 %>% sin | -0.959 | 5 %.>% sin | -0.959 |
5 %>% sin() | -0.959 | 5 %.>% sin() | wrapr::pipe_step.default does not allow direct piping into a no-argument function call expression (such as “sin()”, please use sin(.)). |
5 %>% sin(.) | -0.959 | 5 %.>% sin(.) | -0.959 |
5 %>% base::sin | unused argument (sin) | 5 %.>% base::sin | -0.959 |
5 %>% base::sin() | -0.959 | 5 %.>% base::sin() | wrapr::pipe_step.default does not allow direct piping into a no-argument function call expression (such as “base::sin()”, please use base::sin(.)). |
5 %>% base::sin(.) | -0.959 | 5 %.>% base::sin(.) | -0.959 |
5 %>% ( sin ) | -0.959 | 5 %.>% ( sin ) | -0.959 |
5 %>% ( sin() ) | 0 arguments passed to ‘sin’ which requires 1 | 5 %.>% ( sin() ) | wrapr::pipe_step.default does not allow direct piping into a no-argument function call expression (such as “sin()”, please use sin(.)). |
5 %>% ( sin(.) ) | object ‘.’ not found | 5 %.>% ( sin(.) ) | -0.959 |
5 %>% { sin } | .Primitive(“sin”) | 5 %.>% { sin } | .Primitive(“sin”) |
5 %>% { sin() } | 0 arguments passed to ‘sin’ which requires 1 | 5 %.>% { sin() } | 0 arguments passed to ‘sin’ which requires 1 |
5 %>% { sin(.) } | -0.959 | 5 %.>% { sin(.) } | -0.959 |
5 %>% function(x) { sin(x) } | Anonymous functions myst be parenthesized | 5 %.>% function(x) { sin(x) } | -0.959 |
5 %>% ( function(x) { sin(x) } ) | -0.959 | 5 %.>% ( function(x) { sin(x) } ) | -0.959 |
5 %>% { function(x) { sin(x) } } | function (x) { sin(x) } | 5 %.>% { function(x) { sin(x) } } | function (x) { sin(x) } |
f <- function(x) { sin(x) }; 5 %>% f | -0.959 | f <- function(x) { sin(x) }; 5 %.>% f | -0.959 |
As can now see, some statements were not roughly equivalent to sin(5)
.
One related case to consider is the following (which we run by hand, as it seems to default knitr
or kableExtra
html
styling, note: the “‘\[’” and other formatting errors are an artifacts of HTML
quoting/rendering, and not part of the expressions):
c("lst <- list(h = sin); 5 PIPE_GLYPH lst$h",
"lst <- list(h = sin); 5 PIPE_GLYPH lst$h()",
"lst <- list(h = sin); 5 PIPE_GLYPH lst$h(.)",
"lst <- list(h = sin); 5 PIPE_GLYPH lst[['h']]",
"lst <- list(h = sin); 5 PIPE_GLYPH lst[['h']]()",
"lst <- list(h = sin); 5 PIPE_GLYPH lst[['h']](.)") %.>%
work_examples(., sin(5)) %.>%
knitr::kable(., format = "html", escape = FALSE)
magrittr expr | magrittr res | wrapr expr | wrapr res |
---|---|---|---|
lst <- list(h = sin); 5 %>% lst$h | 3 arguments passed to ‘$’ which requires 2 | lst <- list(h = sin); 5 %.>% lst$h | -0.959 |
lst <- list(h = sin); 5 %>% lst$h() | -0.959 | lst <- list(h = sin); 5 %.>% lst$h() | wrapr::pipe_step.default does not allow direct piping into a no-argument function call expression (such as “lst$h()”, please use lst$h(.)). |
lst <- list(h = sin); 5 %>% lst$h(.) | -0.959 | lst <- list(h = sin); 5 %.>% lst$h(.) | -0.959 |
lst <- list(h = sin); 5 %>% lst[[‘h’]] | incorrect number of subscripts | lst <- list(h = sin); 5 %.>% lst[[‘h’]] | -0.959 |
lst <- list(h = sin); 5 %>% lst[[‘h’]]() | -0.959 | lst <- list(h = sin); 5 %.>% lst[[‘h’]]() | wrapr::pipe_step.default does not allow direct piping into a no-argument function call expression (such as “lst[[”h“]]()”, please use lst[[“h”]](.)). |
lst <- list(h = sin); 5 %>% lst[[‘h’]](.) | -0.959 | lst <- list(h = sin); 5 %.>% lst[[‘h’]](.) | -0.959 |
magrittr
ResultsThe magrittr
exceptions include the following.
::
is a function, as so many things are in R
. So base::sin
is not really the package qualified name for sin()
, it is actually shorthand for `::`("base", "sin")
which is a function evaluation that performs the look-up. So 5 %>% base::sin
expands to an analogue of . <- 5; `::`(., "base", "sin")
, leading to the observed error message.()
is magrittr
’s “evaluate before piping into” notation, so 5 %>% ( sin() )
and 5 %>% ( sin(.) )
both throw an error as evaluation is attempted before any alteration of arguments is attempted.{}
is magrittr
’s “treat the contents as raw statements” notation (which is not in fact magrittr
’s default behavior). Thus magrittr
’s function evaluation signature alteration transforms are not applied to 5 %>% { sin }
or 5 %>% { sin() }
.Again, the above are not magrittr
bugs, they are just how magrittr
’s behavior differs from a very regular or naive internalization of magrittr
rules. Notice neither of “()
” nor “{}
” are neutral notations in magrittr
(the first adds an extra evaluation, and second switches to an expression mode with fewer substitutions). Also note the above is an argument for preferring “sin(.)
” to “sin()
”, or “sin
”; as “sin(.)
” had the most regular magrittr
behavior (not changing with the introduction of “()
”, “{}
”, or “base::
”).
Regularity is especially important for part time users, as you want reasonable variations of what is taught to work so that experimentation is positive and not an exercise in learned helplessness. It is convenient when your tools happen to work the way you might remember.
wrapr
ResultsThe wrapr
error messages and non-numeric returns are driven by the following:
5 %.>% sin()
is not an allowed wrapr
notation. The wrapr
philosophy is not to alter evaluation signatures. The error message is signalling that the statement is not valid wrapr
grammar (not well formed in terms of wrapr
rules). Notice the error message suggests the alternate notation sin(.)
. Similar rules apply for base::sin()
. Then intent is that outer parenthesis are non-semantic, they do not change change wrapr
pipe behavior.5 %.>% { sin }
returns just the sin
function. This is because {}
triggers wrapr
’s “leave the contents alone” behavior.The user only encounters two exceptions in the above variations. The first is “don’t write sin()
”, which comes with a clear error message and help (“try sin(.)
”). The second is “outer {}
treats its contents as raw statements, turning off transforms and checking.
wrapr
is hoping to stay close the principle of least surprise.
The hope is that wrapr
piping is easy, powerful, useful, and not too different than a %.>% b
being treated as almost syntactic sugar for { . <- a; b }
.
An obvious down-side of wrapr
piping is the excess dots both in the operator and in the evaluation arguments. We strongly feel the extra dots in the evaluation arguments is actually a good trade in losing some conciseness in exchange for useful explicitness. We do not consider the extra dot in the pipe operator to be a problem (especially if you bind the operator to a keyboard shortcut). If the extra dot in the pipe operator is such a deal-breaker, consider that it could be gotten rid of by copying the pipe operator to your notation of choice (such as executing `%>%` <- wrapr::`%.>%`
or `%.%` <- wrapr::`%.>%`
at the top of your work). However such re-mappings are needlessly confusing and it is best to use the operator glyph that wrapr
directly supplies.
We can also try a few simpler expressions, that do not have an explicit function marker such as sin(.)
.
c("5 PIPE_GLYPH 1 + .",
"5 PIPE_GLYPH (1 + .)",
"5 PIPE_GLYPH {1 + .}") %.>%
work_examples(., 6) %.>%
knitr::kable(., format = "html", escape = FALSE) %.>%
column_spec(., 1:4, width = "1.75in") %.>%
kable_styling(., "striped", full_width = FALSE)
magrittr expr | magrittr res | wrapr expr | wrapr res |
---|---|---|---|
5 %>% 1 + . | attempt to apply non-function | 5 %.>% 1 + . | wrapr::pipe_step.default does not allow direct piping into simple values such as class:numeric, type:double. |
5 %>% (1 + .) | non-numeric argument to binary operator | 5 %.>% (1 + .) | 6 |
5 %>% {1 + .} | 6 | 5 %.>% {1 + .} | 6 |
Some of what caused exceptions above is “5 %ANYTHING% 1 + .
” is parsed (due to R
’s operator precedence rules) as “(5 %ANYTHING% 1) + .
”. So without extra grouping notations (“()” or “{}”) this is not a well-formed pipeline. With wrapr
it is safe to add in parenthesis, with magrittr
one must use {}
(though this can not be used with 5 %>% {sin}
).
For some operations that are unlikely to work close to reasonable user intent wrapr
includes checks to warn-off the user. The following shows a few more examples of this “defense of grammar.”
5 %.>% 7
## Error in pipe_step.default(pipe_left_arg, pipe_right_arg, pipe_environment, : wrapr::pipe_step.default does not allow direct piping into simple values such as class:numeric, type:double.
# magrittr's error message for the above is something of the form:
# "Error in function_list[[k]](value) : attempt to apply non-function"
5 %.>% .
## Error in pipe_step.default(pipe_left_arg, pipe_right_arg, pipe_environment = pipe_environment, : wrapr::pipe_step.default does not allow direct piping into simple values such as class:numeric, type:double.
# note: the above error message is improved to:
# "wrapr::pipe does not allow direct piping into '.'"
# in wrapr 1.4.1
5 %.>% return(.)
## Error in pipe_step.default(pipe_left_arg, pipe_right_arg, pipe_environment, : wrapr::pipe_step.default does not allow direct piping into certain reserved words or control structures (such as "return").
Throwing errors in these situations is based on the principle that non-signalling errors (often leading to result corruption) are much worse than signalling errors. The “return
” example is an interesting case in point.
Let’s first take a look at the effect with magrittr
. Suppose we were writing a simple function to find for a positive integer returns the smallest non-trivial (greater than 1
and less than the value in question) positive integer divisor of the value in question (returning NA
if there is none such). Such a function might work like the following.
f_base <- function(x) {
u <- min(ceiling(sqrt(x)), x-1L)
i <- 2L
while(i<=u) {
if((x %% i)==0) {
return(i)
}
i <- i + 1L
}
NA_integer_
}
f_base(37)
## [1] NA
f_base(35)
## [1] 5
Now suppose we try to get fancy and use “i %>% return
” instead of “return(i)
”. This produces a function that thinks all integer are prime. The reason is: magrittr
can call the return()
function, but in this situation return()
can’t manage the control path of the original function.
f_magrittr <- function(x) {
u <- min(ceiling(sqrt(x)), x-1L)
i <- 2L
while(i<=u) {
if((x %% i)==0) {
i %>% return
}
i <- i + 1L
}
NA_integer_
}
f_magrittr(37)
## [1] NA
f_magrittr(35)
## [1] NA
Now suppose we tried the same thing with wrapr
pipe and write i %.>% return(.)
.
f_wrapr <- function(x) {
u <- min(ceiling(sqrt(x)), x-1L)
i <- 2L
while(i<=u) {
if((x %% i)==0) {
i %.>% return(.)
}
i <- i + 1L
}
NA_integer_
}
f_wrapr(37)
## [1] NA
f_wrapr(35)
## Error in pipe_step.default(pipe_left_arg, pipe_right_arg, pipe_environment, : wrapr::pipe_step.default does not allow direct piping into certain reserved words or control structures (such as "return").
wrapr
also can not handle return()
control flow correctly, however it (helpfully) throws an exception to indicate the problem.
R
usually has more than one good way to perform tasks. In this case we talked about two methods of building pipelines in R
: magrittr
and wrapr
. There are more methods (some of which are listed here). Our preferred pipe is the wrapr
dot-pipe, and in the of style academic priority we try to credit alternatives and share fair comparisons (as we have done here). Priority is important to respect (as in: magrittr
is powerful, popular, came well before, and greatly influences wrapr
dot-pipe), but it is not monopoly rights (for example: the public CRAN release/announcement of let()
, our popular and still preferred substitution methodology and originally part of replyr
, predates the public CRAN release/announcement of rlang
/tidyeval
code re-writing methods). In client work we use whatever style is most compatible with the client’s work and needs, for example we feel it does not make sense to take a legacy dplyr
project and attempt to switch the pipe notation late in the game (and one does not want to needlessly mix notations).
It has its imitators, but it remains the best “I have R, now what do I do with it?” book (as it works the user through non-trivial projects, analyses, presentations, predictive analytic, data science, and machine learning applications).
All of the code and data used in the book is publicly available here, (including zipped-up examples from the book, and up to date re-runs of those examples). We recently got a few new translation of the book Hangul/Korean to go with the earlier Simplified Chinese. We also have a number of current teaching and training projects extending and using our material (yey!).
]]>