Posted on Categories Coding, data science, Opinion, Programming, Statistics, TutorialsTags , , , , , , , , , , 10 Comments on Non-Standard Evaluation and Function Composition in R

Non-Standard Evaluation and Function Composition in R

In this article we will discuss composing standard-evaluation interfaces (SE: parametric, referentially transparent, or “looks only at values”) and composing non-standard-evaluation interfaces (NSE) in R.

In R the package tidyeval/rlang is a tool for building domain specific languages intended to allow easier composition of NSE interfaces.

To use it you must know some of its structure and notation. Here are some details paraphrased from the major tidyeval/rlang client, the package dplyr: vignette('programming', package = 'dplyr')).

  • ":=" is needed to make left-hand-side re-mapping possible (adding yet another "more than one assignment type operator running around" notation issue).
  • "!!" substitution requires parenthesis to safely bind (so the notation is actually "(!! )", not "!!").
  • Left-hand-sides of expressions are names or strings, while right-hand-sides are quosures/expressions.

Continue reading Non-Standard Evaluation and Function Composition in R

Posted on Categories Opinion, Rants, Statistics, TutorialsTags , , 1 Comment on An easy way to accidentally inflate reported R-squared in linear regression models

An easy way to accidentally inflate reported R-squared in linear regression models

Here is an absolutely horrible way to confuse yourself and get an inflated reported R-squared on a simple linear regression model in R.

We have written about this before, but we found a new twist on the problem (interactions with categorical variable encoding) which we would like to call out here. Continue reading An easy way to accidentally inflate reported R-squared in linear regression models

Posted on Categories Administrativia, Opinion, StatisticsTags , Leave a comment on Campaign Response Testing no longer published on Udemy

Campaign Response Testing no longer published on Udemy

Our free video course Campaign Response Testing is no longer published on Udemy. It remains available for free on YouTube with all source code available from GitHub. I’ll try to correct bad links as I find them.

Please read on for the reasons. Continue reading Campaign Response Testing no longer published on Udemy

Posted on Categories Coding, Opinion, Programming, StatisticsTags , , 2 Comments on More on safe substitution in R

More on safe substitution in R

Let’s worry a bit about substitution in R. Substitution is very powerful, which means it can be both used and mis-used. However, that does not mean every use is unsafe or a mistake.

Continue reading More on safe substitution in R

Posted on Categories Opinion, Programming, StatisticsTags , , , , , Leave a comment on There is usually more than one way in R

There is usually more than one way in R

Python has a fairly famous design principle (from “PEP 20 — The Zen of Python”):

There should be one– and preferably only one –obvious way to do it.

Frankly in R (especially once you add many packages) there is usually more than one way. As an example we will talk about the common R functions: str(), head(), and the tibble package‘s glimpse(). Continue reading There is usually more than one way in R

Posted on Categories data science, Opinion, StatisticsTags , , , Leave a comment on R summary() got better!

R summary() got better!

Here is a really nice feature found in the current 3.4.0 version of R: summary() has become a lot more reasonable.

summary(15555)

#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   15555   15555   15555   15555   15555   15555 

Please read on for some background. Continue reading R summary() got better!

Posted on Categories Coding, Opinion, Programming, StatisticsTags , , , 7 Comments on In defense of wrapr::let()

In defense of wrapr::let()

Saw this the other day:

Wraprvstidyeval

In defense of wrapr::let() (originally part of replyr, and still re-exported by that package) I would say:

  • let() was deliberately designed for a single real-world use case: working with data when you don’t know the column names when you are writing the code (i.e., the column names will come later in a variable). We can re-phrase that as: there is deliberately less to learn as let() is adapted to a need (instead of one having to adapt to let()).
  • The R community already has months of experience confirming let() working reliably in production while interacting with a number of different packages.
  • let() will continue to be a very specific, consistent, reliable, and relevant tool even after dpyr 0.6.* is released, and the community gains experience with rlang/tidyeval in production.

If rlang/tidyeval is your thing, by all means please use and teach it. But please continue to consider also using wrapr::let(). If you are trying to get something done quickly, or trying to share work with others: a “deeper theory” may not be the best choice.

An example follows. Continue reading In defense of wrapr::let()

Posted on Categories Opinion, Statistics, TutorialsTags , , , 2 Comments on dplyr in Context

dplyr in Context

Introduction

Beginning R users often come to the false impression that the popular packages dplyr and tidyr are both all of R and sui generis inventions (in that they might be unprecedented and there might no other reasonable way to get the same effects in R). These packages and their conventions are high-value, but they are results of evolution and implement a style of programming that has been available in R for some time. They evolved in a context, and did not burst on the scene fully armored with spear in hand.

Continue reading dplyr in Context

Posted on Categories Coding, Opinion, Programming, StatisticsTags , , , , 6 Comments on Why to use wrapr::let()

Why to use wrapr::let()

I have written about referential transparency before. In this article I would like to discuss “leaky abstractions” and why wrapr::let() supplies a useful (but leaky) abstraction for R programmers.

Wraprs Continue reading Why to use wrapr::let()

Posted on Categories data science, Expository Writing, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Programming, Statistics, TutorialsTags , , , , , ,

Teaching pivot / un-pivot

Authors: John Mount and Nina Zumel

Introduction

In teaching thinking in terms of coordinatized data we find the hardest operations to teach are joins and pivot.

One thing we commented on is that moving data values into columns, or into a “thin” or entity/attribute/value form (often called “un-pivoting”, “stacking”, “melting” or “gathering“) is easy to explain, as the operation is a function that takes a single row and builds groups of new rows in an obvious manner. We commented that the inverse operation of moving data into rows, or the “widening” operation (often called “pivoting”, “unstacking”, “casting”, or “spreading”) is harder to explain as it takes a specific group of columns and maps them back to a single row. However, if we take extra care and factor the pivot operation into its essential operations we find pivoting can be usefully conceptualized as a simple single row to single row mapping followed by a grouped aggregation.

Please read on for our thoughts on teaching pivoting data. Continue reading Teaching pivot / un-pivot