dplyrusers who had such a need, and wanted such extensions.
dplyrusers who did not have such a need ("we always know the column names").
dplyrusers who found the then-current fairly complex "underscore" and
lazyevalsystem sufficient for the task.
Needing name substitution is a problem an advanced full-time
R user can solve on their own. However a part-time
R would greatly benefit from a simple, reliable, readable, documented, and comprehensible packaged solution. Continue reading Let’s Have Some Sympathy For The Part-time R User
To illustrate this we will work an example.
I have been writing a lot (too much) on the
tidyeval lately. The reason is: major changes were recently announced. If you are going to use
dplyr well and correctly going forward you may need to understand some of the new issues (if you don’t use
dplyr you can safely skip all of this). I am trying to work out (publicly) how to best incorporate the new methods into:
- real world analyses,
- reusable packages,
- and teaching materials.
I think some of the apparent discomfort on my part comes from my feeling that
dplyr never really gave standard evaluation (SE) a fair chance. In my opinion:
dplyr is based strongly on non-standard evaluation (NSE, originally through
lazyeval and now through
tidyeval) more by the taste and choice than by actual analyst benefit or need.
dplyr isn’t my package, so it isn’t my choice to make; but I can still have an informed opinion, which I will discuss below.
tidyverse itself is a rapidly evolving centrally controlled package collection. The
tidyverse authors publicly appear to be interested in re-basing the
tidyverse in terms of their new
tidyeval package. So it is natural to wonder: what is the future of
magrittr (a pre-
tidyeval package) in the
tidyverse? Continue reading What is magrittr’s future in the tidyverse?
I think a critical aspect of such an extension would be to treat such a notation as syntactic sugar and not insist such a pipe match magrittr semantics, or worse yet give a platform for authors to insert their own preferred ad-hoc semantics. Continue reading In praise of syntactic sugar
One of the great advantages to specifying complicated sequences of operations in data (rather than in code) is: it is often easier to transform and extend data. Explicit rich data beats vague convention and complicated code.
dplyr issue 2916.
The following appears to work.
suppressPackageStartupMessages(library("dplyr")) COL <- "homeworld" starwars %>% group_by(.data[[COL]]) %>% head(n=1)
## # A tibble: 1 x 14 ## # Groups: COL  ## name height mass hair_color skin_color eye_color birth_year ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> ## 1 Luke Skywalker 172 77 blond fair blue 19 ## # ... with 7 more variables: gender <chr>, homeworld <chr>, species <chr>, ## # films <list>, vehicles <list>, starships <list>, COL <chr>
Though notice it reports the grouping is by "
COL", not by "
homeworld". Also the data set now has
14 columns, not the original
13 from the
starwars data set.