Posted on Categories Programming, StatisticsTags , , , , 4 Comments on What is magrittr’s future in the tidyverse?

What is magrittr’s future in the tidyverse?

For many R users the magrittr pipe is a popular way to arrange computation and famously part of the tidyverse.

NewImage

The tidyverse itself is a rapidly evolving centrally controlled package collection. The tidyverse authors publicly appear to be interested in re-basing the tidyverse in terms of their new rlang/tidyeval package. So it is natural to wonder: what is the future of magrittr (a pre-rlang/tidyeval package) in the tidyverse? Continue reading What is magrittr’s future in the tidyverse?

Posted on Categories Opinion, Programming, Statistics, TutorialsTags , , , 8 Comments on In praise of syntactic sugar

In praise of syntactic sugar

There has been some talk of adding native pipe notation to R (for example here, here, and here). And even a tidyeval/rlang pipe here.

I think a critical aspect of such an extension would be to treat such a notation as syntactic sugar and not insist such a pipe match magrittr semantics, or worse yet give a platform for authors to insert their own preferred ad-hoc semantics. Continue reading In praise of syntactic sugar

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Programming, Statistics, TutorialsTags , , , , , 1 Comment on Join Dependency Sorting

Join Dependency Sorting

In our latest installment of “R and big data” let’s again discuss the task of left joining many tables from a data warehouse using R and a system called "a join controller" (last discussed here).

One of the great advantages to specifying complicated sequences of operations in data (rather than in code) is: it is often easier to transform and extend data. Explicit rich data beats vague convention and complicated code.

Continue reading Join Dependency Sorting

Posted on Categories Opinion, Programming, StatisticsTags , , , , , 2 Comments on Using wrapr::let() with tidyeval

Using wrapr::let() with tidyeval

While going over some of the discussion related to my last post I came up with a really neat way to use wrapr::let() and rlang/tidyeval together.

Please read on to see the situation and example. Continue reading Using wrapr::let() with tidyeval

Posted on Categories Opinion, Programming, StatisticsTags , , , 5 Comments on Please Consider Using wrapr::let() for Replacement Tasks

Please Consider Using wrapr::let() for Replacement Tasks

From dplyr issue 2916.

The following appears to work.

suppressPackageStartupMessages(library("dplyr"))

COL <- "homeworld"
starwars %>%
  group_by(.data[[COL]]) %>%
  head(n=1)
## # A tibble: 1 x 14
## # Groups:   COL [1]
##             name height  mass hair_color skin_color eye_color birth_year
##            <chr>  <int> <dbl>      <chr>      <chr>     <chr>      <dbl>
## 1 Luke Skywalker    172    77      blond       fair      blue         19
## # ... with 7 more variables: gender <chr>, homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>, COL <chr>

Though notice it reports the grouping is by "COL", not by "homeworld". Also the data set now has 14 columns, not the original 13 from the starwars data set.

Continue reading Please Consider Using wrapr::let() for Replacement Tasks

Posted on Categories Coding, Programming, Statistics, TutorialsTags , , ,

wrapr Implementation Update

Introduction

The development version CRAN version of our R helper function wrapr::let() has switched from string-based substitution to abstract syntax tree based substitution (AST based substitution, or language based substitution).

Wraprs

I am looking for some feedback from wrapr::let() users already doing substantial work with wrapr::let(). If you are already using wrapr::let() please test if the current development version of wrapr works with your code. If you run into problems: I apologize, and please file a GitHub issue.

Continue reading wrapr Implementation Update

Posted on Categories Coding, data science, Opinion, Programming, Statistics, TutorialsTags , , , , , , , , , , 10 Comments on Non-Standard Evaluation and Function Composition in R

Non-Standard Evaluation and Function Composition in R

In this article we will discuss composing standard-evaluation interfaces (SE: parametric, referentially transparent, or “looks only at values”) and composing non-standard-evaluation interfaces (NSE) in R.

In R the package tidyeval/rlang is a tool for building domain specific languages intended to allow easier composition of NSE interfaces.

To use it you must know some of its structure and notation. Here are some details paraphrased from the major tidyeval/rlang client, the package dplyr: vignette('programming', package = 'dplyr')).

  • ":=" is needed to make left-hand-side re-mapping possible (adding yet another "more than one assignment type operator running around" notation issue).
  • "!!" substitution requires parenthesis to safely bind (so the notation is actually "(!! )", not "!!").
  • Left-hand-sides of expressions are names or strings, while right-hand-sides are quosures/expressions.

Continue reading Non-Standard Evaluation and Function Composition in R

Posted on Categories Applications, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Programming, Statistics, TutorialsTags , , ,

Managing intermediate results when using R/sparklyr

In our latest “R and big data” article we show how to manage intermediate results in non-trivial Apache Spark workflows using R, sparklyr, dplyr, and replyr.


NewImage
Continue reading Managing intermediate results when using R/sparklyr

Posted on Categories Coding, Opinion, Programming, StatisticsTags , , 2 Comments on More on safe substitution in R

More on safe substitution in R

Let’s worry a bit about substitution in R. Substitution is very powerful, which means it can be both used and mis-used. However, that does not mean every use is unsafe or a mistake.

Continue reading More on safe substitution in R

Posted on Categories Opinion, Programming, StatisticsTags , , , , ,

There is usually more than one way in R

Python has a fairly famous design principle (from “PEP 20 — The Zen of Python”):

There should be one– and preferably only one –obvious way to do it.

Frankly in R (especially once you add many packages) there is usually more than one way. As an example we will talk about the common R functions: str(), head(), and the tibble package‘s glimpse(). Continue reading There is usually more than one way in R