Posted on Categories Opinion, Programming, TutorialsTags , ,

Standard Evaluation Versus Non-Standard Evaluation in R

There is a lot of unnecessary worry over “Non Standard Evaluation” (NSE) in R versus “Standard Evaluation” (SE, or standard “variables names refer to values” evaluation). This very author is guilty of over-discussing the issue. But let’s give this yet another try.

The entire difference between NSE and regular evaluation can be summed up in the following simple table (which should be clear after we work some examples).

Tbl

Standard Evaluation

In standard (or value oriented evaluation) code you type in is taken to be variable names, functions, names, operators, and even numeric literal values. String values or literals need extra marks, such as quotes.

Let’s look at this very slowly. First we set up a couple of variables.

d <- data.frame(x = 1, y = 2)

x <- "y"

Now if we want to extract the column named by x (which happens to be "y") we write code such as the following.

d[, x, drop=FALSE]
#   y
# 1 2

R treats by default the symbol x as a variable name and uses the value associated with it (in this case "y") to select a data column by name.

If we want to name the "x" directly (or treat "x" as a literal value) we add a small bit of extra notation: quote marks. This looks like the following.

d[, "x", drop=FALSE]
#   x
# 1 1

Notice this time we extracted the column named "x".

To summarize: in standard evaluation user written (or source code) names are treated as references to values, and to form string literals one adds quotes.

Non Standard Evaluation

In NSE (non-standard evaluation) the roles are reversed. Names are treated as literal string values, unless extra marks are added.

Non-standard techniques have been built into R quite some time. For example:

bquote(x)
# x

bquote(.(x))
# [1] "y"

Notice how in the first example bquote() captured the name “x“, which would allow code to use this name as a value. In the second example x was substituted for its associated value: the character string "y".

dplyr instead of using the existing R/bquote() methodology (re-)invented its own notation “!!“. For example x written alone is the string "x".

suppressPackageStartupMessages(library("dplyr"))

select(d, x)
#   x
# 1 1

Notice this extracted the column named "x", and not the column named "y" ("y" being the value associated with the variable x).

In contrast, if we want to use the value associate with x in NSE we add the extra de-quoting symbols !!, as we show here.

select(d, !!x)
#   y
# 1 2

Notice this extracted the column named "y" ("y" being the value associated with the variable x).

Though in some cases in the rlang/dplyr argot, one must sometimes first prepare a variable or value for de-quoting using one of sym(), ensym(), vars(), quo(), enquo(), quo_name(), quos(), enquos(), expr(), or enexpr(). Which to use depends on some combination of what you are trying to quote, where the value comes from (user code, function argument), what type of execution situation you are in (in function body, or not), execution rules (dplyr/tidyselect select rules versus non-select rules), and what version of rlang you are using.

There is nothing magic about the “!!” notation. rlang could just as easily chosen many others (one rumor is that a {{}} notation is coming, to push rlang closer to glue‘s {} string-interpolation notation). One possible notation would have been the .() notation already used in R, bquote(), and data.table (though data.table uses .() for something closer to list construction/quoting instead of marking for evaluation).

For instance a .()-enabled version of dplyr can be easily simulated as follows.

select <- wrapr::bquote_function(dplyr::select)

select(d, x)
#   x
# 1 1

select(d, .(x))
#   y
# 1 2

Note, the above (while it works) is just for demonstration. dplyr is designed to use !! not .().

To summarize: in non-standard evaluation user written (or source code) names are treated as string literals, and to reference variable values one adds an escaping notation.

Conclusion

NSE defaults to quoting names to make them string literals, whereas SE defaults to expecting unquoted variable names and requiring quotes around string literals.

With the above examples under your belt, you should be able to see the simple point of the initial table: NSE and SE are just complimentary notations. Neither notation is more powerful than the other, and R has had good tools to use both notations well before rlang was started.

2 thoughts on “Standard Evaluation Versus Non-Standard Evaluation in R”

  1. Roman Luštrik recently tweeted “just use freaking standard eval, folks”.

    Just in case this article is too long to make it clear: we agree.

    There are roughly two different paths to this conclusion.

    • Spend time studying NSE in R, and decide it is not worth the trouble.
    • Decide studying NSE in R is not worth the trouble, and save the time that would have been used studying NSE in R.
  2. I think, NSE has some merits when it comes to “simple” scripting, e.g. creating some statistical analyses in RMarkdown. On the other hand, it certainly has reached a point, where people are starting it all over also in serious code, which inevitably will lead to nasty bugs here and there. So, imo, the message should be: only use NSE if it improves readability of the code/script and definitely don’t use it in any type of serious/larger project.

Leave a Reply to Roman Pahl Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.