Posted on Categories Coding, data science, Opinion, Programming, Statistics, TutorialsTags , , , , , , , , , , 10 Comments on Non-Standard Evaluation and Function Composition in R

Non-Standard Evaluation and Function Composition in R

In this article we will discuss composing standard-evaluation interfaces (SE) and composing non-standard-evaluation interfaces (NSE) in R.

In R the package tidyeval/rlang is a tool for building domain specific languages intended to allow easier composition of NSE interfaces.

To use it you must know some of its structure and notation. Here are some details paraphrased from the major tidyeval/rlang client, the package dplyr: vignette('programming', package = 'dplyr')).

  • ":=" is needed to make left-hand-side re-mapping possible (adding yet another "more than one assignment type operator running around" notation issue).
  • "!!" substitution requires parenthesis to safely bind (so the notation is actually "(!! )", not "!!").
  • Left-hand-sides of expressions are names or strings, while right-hand-sides are quosures/expressions.

Continue reading Non-Standard Evaluation and Function Composition in R

Posted on Categories Opinion, Statistics, TutorialsTags , , , 2 Comments on dplyr in Context

dplyr in Context


Beginning R users often come to the false impression that the popular packages dplyr and tidyr are both all of R and sui generis inventions (in that they might be unprecedented and there might no other reasonable way to get the same effects in R). These packages and their conventions are high-value, but they are results of evolution and implement a style of programming that has been available in R for some time. They evolved in a context, and did not burst on the scene fully armored with spear in hand.

Continue reading dplyr in Context

Posted on Categories Opinion, Statistics, TutorialsTags , , , , , , ,

Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

This is a note on debugging magrittr pipelines in R using Bizarro Pipe and eager assignment.

Continue reading Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Programming, Statistics, TutorialsTags , , , , 16 Comments on The Zero Bug

The Zero Bug

I am going to write about an insidious statistical, data analysis, and presentation fallacy I call “the zero bug” and the habits you need to cultivate to avoid it.

The zero bug

The zero bug

Here is the zero bug in a nutshell: common data aggregation tools often can not “count to zero” from examples, and this causes problems. Please read on for what this means, the consequences, and how to avoid the problem. Continue reading The Zero Bug

Posted on Categories Exciting Techniques, Statistics, TutorialsTags , , , , , , , 3 Comments on Using the Bizarro Pipe to Debug magrittr Pipelines in R

Using the Bizarro Pipe to Debug magrittr Pipelines in R

I have just finished and released a free new R video lecture demonstrating how to use the “Bizarro pipe” to debug magrittr pipelines. I think R dplyr users will really enjoy it.

Please read on for the link to the video lecture. Continue reading Using the Bizarro Pipe to Debug magrittr Pipelines in R

Posted on Categories Administrativia, StatisticsTags , , , , , , ,

Upcoming Win-Vector LLC public speaking engagements

I am happy to announce a couple of exciting upcoming Win-Vector LLC public speaking engagements.

Hope to see you there!

Posted on Categories Coding, Opinion, Programming, Statistics, TutorialsTags , , , 8 Comments on Comparative examples using replyr::let

Comparative examples using replyr::let

Consider the problem of “parametric programming” in R. That is: simply writing correct code before knowing some details, such as the names of the columns your procedure will have to be applied to in the future. Our latest version of replyr::let makes such programming easier.

Archie’s Mechanics #2 (1954) copyright Archie Publications

(edit: great news! CRAN just accepted our replyr 0.2.0 fix release!)

Please read on for examples comparing standard notations and replyr::let. Continue reading Comparative examples using replyr::let

Posted on Categories Opinion, StatisticsTags , , , , , , 3 Comments on Organize your data manipulation in terms of “grouped ordered apply”

Organize your data manipulation in terms of “grouped ordered apply”

Consider the common following problem: compute for a data set (say the infamous iris example data set) per-group ranks. Suppose we want the rank of iris Sepal.Lengths on a per-Species basis. Frankly this is an “ugh” problem for many analysts: it involves all at the same time grouping, ordering, and window functions. It also is not likely ever the analyst’s end goal but a sub-step needed to transform data on the way to the prediction, modeling, analysis, or presentation they actually wish to get back to.

Iris germanica Purple bearded Iris Wakehurst Place UK DiliffIris, by DiliffOwn work, CC BY-SA 3.0, Link

In our previous article in this series we discussed the general ideas of “row-ID independent data manipulation” and “Split-Apply-Combine”. Here, continuing with our example, we will specialize to a data analysis pattern I call: “Grouped-Ordered-Apply”. Continue reading Organize your data manipulation in terms of “grouped ordered apply”

Posted on Categories Opinion, Programming, RantsTags , , , , , , , , , 12 Comments on magrittr’s Doppelgänger

magrittr’s Doppelgänger

R picked up a nifty way to organize sequential calculations in May of 2014: magrittr by Stefan Milton Bache and Hadley Wickham. magrittr is now quite popular and also has become the backbone of current dplyr practice.

If you read my last article on assignment carefully you may have noticed I wrote some code that was equivalent to a magrittr pipeline without using the “%>%” operator. This note will expand (tongue in cheek) that notation into an alternative to magrittr that you should never use.


Superman #169 (May 1964, copyright DC)

What follows is a joke (though everything does work as I state it does, nothing is faked). Continue reading magrittr’s Doppelgänger

Posted on Categories Opinion, Programming, Rants, StatisticsTags , , , 29 Comments on The Case For Using -> In R

The Case For Using -> In R

R has a number of assignment operators (at least “<-“, “=“, and “->“; plus “<<-” and “->>” which have different semantics).

The R-style guides routinely insist on “<-” as being the only preferred form. In this note we are going to try to make the case for “->” when using magrittr pipelines. [edit: After reading this article, please be sure to read Konrad Rudolph’s masterful argument for using only “=” for assignment. He also demonstrates a function to land values from pipelines (though that is not his preference). All joking aside, the value-landing part of the proposal does not violate current style guidelines.]

Honore Daumier 017 Don Quixote

Don Quijote and Sancho Panza, by Honoré Daumier

Continue reading The Case For Using -> In R