For many R users the
magrittr pipe is a popular way to arrange computation and famously part of the
tidyverse itself is a rapidly evolving centrally controlled package collection. The
tidyverse authors publicly appear to be interested in re-basing the
tidyverse in terms of their new
tidyeval package. So it is natural to wonder: what is the future of
magrittr (a pre-
tidyeval package) in the
tidyverse? Continue reading What is magrittr’s future in the tidyverse?
There has been some talk of adding native pipe notation to R (for example here, here, and here). And even a
rlang pipe here.
I think a critical aspect of such an extension would be to treat such a notation as syntactic sugar and not insist such a pipe match magrittr semantics, or worse yet give a platform for authors to insert their own preferred ad-hoc semantics. Continue reading In praise of syntactic sugar
In our latest R and Big Data article we discuss replyr.
replyr stands for REmote PLYing of big data for R.
Why should R users try
replyr? Because it lets you take a number of common working patterns and apply them to remote data (such as databases or
replyr allows users to work with
Spark or database data similar to how they work with local
data.frames. Some key capability gaps remedied by
- Summarizing data:
- Combining tables:
- Binding tables by row:
- Using the split/apply/combine pattern (
- Pivot/anti-pivot (
- Handle tracking.
- A join controller.
You may have already learned to decompose your local data processing into steps including the above, so retaining such capabilities makes working with
sparklyr much easier. Some of the above capabilities will likely come to the
tidyverse, but the above implementations are build purely on top of
dplyr and are the ones already being vetted and debugged at production scale (I think these will be ironed out and reliable sooner).
Continue reading Working With R and Big Data: Use Replyr
In our latest installment of “
R and big data” let’s again discuss the task of left joining many tables from a data warehouse using
R and a system called "a join controller" (last discussed here).
One of the great advantages to specifying complicated sequences of operations in data (rather than in code) is: it is often easier to transform and extend data. Explicit rich data beats vague convention and complicated code.
Continue reading Join Dependency Sorting
While going over some of the discussion related to my last post I came up with a really neat way to use
Please read on to see the situation and example. Continue reading Using wrapr::let() with tidyeval
dplyr issue 2916.
The following appears to work.
COL <- "homeworld"
## # A tibble: 1 x 14
## # Groups: COL 
## name height mass hair_color skin_color eye_color birth_year
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl>
## 1 Luke Skywalker 172 77 blond fair blue 19
## # ... with 7 more variables: gender <chr>, homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>, COL <chr>
Though notice it reports the grouping is by "
COL", not by "
homeworld". Also the data set now has
14 columns, not the original
13 from the
starwars data set.
Continue reading Please Consider Using wrapr::let() for Replacement Tasks
development version CRAN version of our
R helper function
wrapr::let() has switched from string-based substitution to abstract syntax tree based substitution (AST based substitution, or language based substitution).
I am looking for some feedback from
wrapr::let() users already doing substantial work with
wrapr::let(). If you are already using
wrapr::let() please test if the current development version of
wrapr works with your code. If you run into problems: I apologize, and please file a
Continue reading wrapr Implementation Update
In this article we will discuss composing standard-evaluation interfaces (SE: parametric, referentially transparent, or “looks only at values”) and composing non-standard-evaluation interfaces (NSE) in
R the package
rlang is a tool for building domain specific languages intended to allow easier composition of NSE interfaces.
To use it you must know some of its structure and notation. Here are some details paraphrased from the major
rlang client, the package dplyr:
vignette('programming', package = 'dplyr')).
:=" is needed to make left-hand-side re-mapping possible (adding yet another "more than one assignment type operator running around" notation issue).
!!" substitution requires parenthesis to safely bind (so the notation is actually "
(!! )", not "
- Left-hand-sides of expressions are names or strings, while right-hand-sides are
Continue reading Non-Standard Evaluation and Function Composition in R
Here is an absolutely horrible way to confuse yourself and get an inflated reported
R-squared on a simple linear regression model in
We have written about this before, but we found a new twist on the problem (interactions with categorical variable encoding) which we would like to call out here. Continue reading An easy way to accidentally inflate reported R-squared in linear regression models
This note describes a useful
replyr tool we call a "join controller" (and is part of our "R and Big Data" series, please see here for the introduction, and here for one our big data courses).
Continue reading Use a Join Controller to Document Your Work