In R the
[[ ]] is the operator that (when supplied a simple scalar argument) pulls a single element out of lists (and the
[ ] operator pulls out sub-lists).
[[ ]] and
[ ] appear to be synonyms (modulo the issue of names). However, for a vector
[[ ]] checks that the indexing argument is a scalar, so if you intend to retrieve one element this is a good way of getting an extra check and documenting intent. Also, when writing reusable code you may not always be sure if your code is going to be applied to a vector or list in the future.
It is safer to get into the habit of always using
[[ ]] when you intend to retrieve a single element.
Example with lists:
list("a", "b") #> [] #>  "a" list("a", "b")[] #>  "a"
Example with vectors:
c("a", "b") #>  "a" c("a", "b")[] #>  "a"
The idea is: in situations where both
[ ] and
[[ ]] apply we rarely see
[[ ]] being the worse choice.
Note on this article series.
This R tips series is short simple notes on R best practices, and additional packaged tools. The intent is to show both how to perform common tasks, and how to avoid common pitfalls. I hope to share about 20 of these about every other day to learn from the community which issues resonate and to also introduce some of features from some of our packages. It is an opinionated series and will sometimes touch on coding style, and also try to showcase appropriate Win-Vector LLC R tools.
There are substantial differences between ad-hoc analyses (be they: machine learning research, data science contests, or other demonstrations) and production worthy systems. Roughly: ad-hoc analyses have to be correct only at the moment they are run (and often once they are correct, that is the last time they are run; obviously the idea of reproducible research is an attempt to raise this standard). Production systems have to be durable: they have to remain correct as models, data, packages, users, and environments change over time.
Demonstration systems need merely glow in bright light among friends; production systems must be correct, even alone in the dark.
“Character is what you are in the dark.”
I have found: to deliver production worthy data science and predictive analytic systems, one has to develop per-team and per-project field tested recommendations and best practices. This is necessary even when, or especially when, these procedures differ from official doctrine.
When trying to count rows using
dplyr controlled data-structures (remote
tbls such as
dbplyr structures) one is sailing between Scylla and Charybdis. The task being to avoid
dplyr corner-cases and irregularities (a few of which I attempt to document in this "
tidyverse itself is a rapidly evolving centrally controlled package collection. The
tidyverse authors publicly appear to be interested in re-basing the
tidyverse in terms of their new
tidyeval package. So it is natural to wonder: what is the future of
magrittr (a pre-
tidyeval package) in the
tidyverse? Continue reading What is magrittr’s future in the tidyverse?
dplyr issue 2916.
The following appears to work.
suppressPackageStartupMessages(library("dplyr")) COL <- "homeworld" starwars %>% group_by(.data[[COL]]) %>% head(n=1)
## # A tibble: 1 x 14 ## # Groups: COL  ## name height mass hair_color skin_color eye_color birth_year ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> ## 1 Luke Skywalker 172 77 blond fair blue 19 ## # ... with 7 more variables: gender <chr>, homeworld <chr>, species <chr>, ## # films <list>, vehicles <list>, starships <list>, COL <chr>
Though notice it reports the grouping is by "
COL", not by "
homeworld". Also the data set now has
14 columns, not the original
13 from the
starwars data set.
In this article we will discuss composing standard-evaluation interfaces (SE: parametric, referentially transparent, or “looks only at values”) and composing non-standard-evaluation interfaces (NSE) in
To use it you must know some of its structure and notation. Here are some details paraphrased from the major
rlang client, the package dplyr:
vignette('programming', package = 'dplyr')).
:=" is needed to make left-hand-side re-mapping possible (adding yet another "more than one assignment type operator running around" notation issue).
!!" substitution requires parenthesis to safely bind (so the notation is actually "
(!! )", not "
- Left-hand-sides of expressions are names or strings, while right-hand-sides are
Here is an absolutely horrible way to confuse yourself and get an inflated reported
R-squared on a simple linear regression model in
We have written about this before, but we found a new twist on the problem (interactions with categorical variable encoding) which we would like to call out here. Continue reading An easy way to accidentally inflate reported R-squared in linear regression models