Posted on Categories Coding, Opinion, Statistics, TutorialsTags , , , , Leave a comment on Why to use the replyr R package

Why to use the replyr R package

Recently I noticed that the R package sparklyr had the following odd behavior:

suppressPackageStartupMessages(library("dplyr"))
library("sparklyr")
packageVersion("dplyr")
#> [1] '0.7.2.9000'
packageVersion("sparklyr")
#> [1] '0.6.2'
packageVersion("dbplyr")
#> [1] '1.1.0.9000'

sc <- spark_connect(master = 'local')
#> * Using Spark: 2.1.0
d <- dplyr::copy_to(sc, data.frame(x = 1:2))

dim(d)
#> [1] NA
ncol(d)
#> [1] NA
nrow(d)
#> [1] NA

This means user code or user analyses that depend on one of dim(), ncol() or nrow() possibly breaks. nrow() used to return something other than NA, so older work may not be reproducible.

In fact: where I actually noticed this was deep in debugging a client project (not in a trivial example, such as above).


Tron
Tron: fights for the users.

In my opinion: this choice is going to be a great source of surprises, unexpected behavior, and bugs going forward for both sparklyr and dbplyr users. Continue reading Why to use the replyr R package

Posted on Categories Coding, data science, Opinion, Programming, Statistics, TutorialsTags , , , , 13 Comments on Tutorial: Using seplyr to Program Over dplyr

Tutorial: Using seplyr to Program Over dplyr

seplyr is an R package that makes it easy to program over dplyr 0.7.*.

To illustrate this we will work an example.

Continue reading Tutorial: Using seplyr to Program Over dplyr

Posted on Categories Coding, Programming, Statistics, TutorialsTags , , ,

wrapr Implementation Update

Introduction

The development version CRAN version of our R helper function wrapr::let() has switched from string-based substitution to abstract syntax tree based substitution (AST based substitution, or language based substitution).

Wraprs

I am looking for some feedback from wrapr::let() users already doing substantial work with wrapr::let(). If you are already using wrapr::let() please test if the current development version of wrapr works with your code. If you run into problems: I apologize, and please file a GitHub issue.

Continue reading wrapr Implementation Update

Posted on Categories Coding, data science, Opinion, Programming, Statistics, TutorialsTags , , , , , , , , , , 10 Comments on Non-Standard Evaluation and Function Composition in R

Non-Standard Evaluation and Function Composition in R

In this article we will discuss composing standard-evaluation interfaces (SE: parametric, referentially transparent, or “looks only at values”) and composing non-standard-evaluation interfaces (NSE) in R.

In R the package tidyeval/rlang is a tool for building domain specific languages intended to allow easier composition of NSE interfaces.

To use it you must know some of its structure and notation. Here are some details paraphrased from the major tidyeval/rlang client, the package dplyr: vignette('programming', package = 'dplyr')).

  • ":=" is needed to make left-hand-side re-mapping possible (adding yet another "more than one assignment type operator running around" notation issue).
  • "!!" substitution requires parenthesis to safely bind (so the notation is actually "(!! )", not "!!").
  • Left-hand-sides of expressions are names or strings, while right-hand-sides are quosures/expressions.

Continue reading Non-Standard Evaluation and Function Composition in R

Posted on Categories Coding, Opinion, Programming, StatisticsTags , , 2 Comments on More on safe substitution in R

More on safe substitution in R

Let’s worry a bit about substitution in R. Substitution is very powerful, which means it can be both used and mis-used. However, that does not mean every use is unsafe or a mistake.

Continue reading More on safe substitution in R

Posted on Categories Coding, Opinion, Programming, StatisticsTags , , , 7 Comments on In defense of wrapr::let()

In defense of wrapr::let()

Saw this the other day:

Wraprvstidyeval

In defense of wrapr::let() (originally part of replyr, and still re-exported by that package) I would say:

  • let() was deliberately designed for a single real-world use case: working with data when you don’t know the column names when you are writing the code (i.e., the column names will come later in a variable). We can re-phrase that as: there is deliberately less to learn as let() is adapted to a need (instead of one having to adapt to let()).
  • The R community already has months of experience confirming let() working reliably in production while interacting with a number of different packages.
  • let() will continue to be a very specific, consistent, reliable, and relevant tool even after dpyr 0.6.* is released, and the community gains experience with rlang/tidyeval in production.

If rlang/tidyeval is your thing, by all means please use and teach it. But please continue to consider also using wrapr::let(). If you are trying to get something done quickly, or trying to share work with others: a “deeper theory” may not be the best choice.

An example follows. Continue reading In defense of wrapr::let()

Posted on Categories Coding, Opinion, Programming, StatisticsTags , , , , 6 Comments on Why to use wrapr::let()

Why to use wrapr::let()

I have written about referential transparency before. In this article I would like to discuss “leaky abstractions” and why wrapr::let() supplies a useful (but leaky) abstraction for R programmers.

Wraprs Continue reading Why to use wrapr::let()

Posted on Categories Coding, OpinionTags , , , , 1 Comment on Another R [Non-]Standard Evaluation Idea

Another R [Non-]Standard Evaluation Idea

Jonathan Carroll had a an interesting R language idea: to use @-notation to request value substitution in a non-standard evaluation environment (inspired by msyql User-Defined Variables).

He even picked the right image:

PandorasBox Continue reading Another R [Non-]Standard Evaluation Idea

Posted on Categories Coding, Opinion, StatisticsTags , , , , , , , , , 7 Comments on wrapr: for sweet R code

wrapr: for sweet R code

This article is on writing sweet R code using the wrapr package.


Wrapr
Continue reading wrapr: for sweet R code

Posted on Categories Coding, Opinion, Programming, Statistics, TutorialsTags , , , 8 Comments on Comparative examples using replyr::let

Comparative examples using replyr::let

Consider the problem of “parametric programming” in R. That is: simply writing correct code before knowing some details, such as the names of the columns your procedure will have to be applied to in the future. Our latest version of replyr::let makes such programming easier.


NewImage
Archie’s Mechanics #2 (1954) copyright Archie Publications

(edit: great news! CRAN just accepted our replyr 0.2.0 fix release!)

Please read on for examples comparing standard notations and replyr::let. Continue reading Comparative examples using replyr::let