Posted on Categories Opinion, Programming, TutorialsTags , , , ,

Parameterizing with bquote

One thing that is sure to get lost in my long note on macros in R is just how concise and powerful macros are. The problem is macros are concise, but they do a lot for you. So you get bogged down when you explain the joke.

Let’s try to be concise.

Below is an extension of an example taken from the Programming with dplyr note.

First let’s load the package and define our symbols that hold names of columns we wish to work with later.

suppressPackageStartupMessages(library("dplyr"))

group_nm <- as.name("am")
num_nm <- as.name("hp")
den_nm <- as.name("cyl")
derived_nm <- as.name(paste0(num_nm, "_per_", den_nm))
mean_nm <- as.name(paste0("mean_", derived_nm))
count_nm <- as.name("count")

Now let’s use rlang to substitute those symbols into a non-trivial dplyr pipeline.

mtcars %>%
  group_by(!!group_nm) %>%
  mutate(!!derived_nm := !!num_nm/!!den_nm) %>%
  summarize(
    !!mean_nm := mean(!!derived_nm),
    !!count_nm := n()
  ) %>%
  ungroup() %>%
  arrange(!!group_nm)
## # A tibble: 2 x 3
##      am mean_hp_per_cyl count
##   <dbl>           <dbl> <int>
## 1     0            22.7    19
## 2     1            23.4    13

The above is very useful, we have just gotten programmatic control of all the symbol names in a pipeline. This is what is needed to wrap such a pipeline in a function and make it parametric re-usable.

The thing is Thomas Lumley’s base::bquote() could achieve this in 2003, and Gregory R. Warnes’ gtools::strmacro() could further automate specifying the automation in 2005.

Lets show that. First we use gtools to build a "bquote() wrapping factory."

library("gtools")

# build a method wrapping macro
bq_wrap <- strmacro(
  FN,
  expr = {
    FN <- function(.data, ...) {
      env = parent.frame()
      mc <- substitute(dplyr::FN(.data = .data, ...))
      mc <- do.call(bquote, list(mc, where = env), envir = env)
      eval(mc, envir = env)
    }
  }
)

Now we use it to wrap some dplyr methods (ignoring non ... options).

# wrap some dplyr methods
bq_wrap(mutate)
bq_wrap(summarize)
bq_wrap(group_by)
bq_wrap(arrange)

At this point we have re-adapted 4 dplyr methods to use bquote() quasiquotation. This is what we mean by strmacro() is a tool to build tools.

And here is the same pipeline again, entirely driven by bquote().

mtcars %>%
  group_by(.(group_nm)) %>%
  mutate(.(derived_nm) := .(num_nm)/.(den_nm)) %>%
  summarize(
    .(mean_nm) := mean(.(derived_nm)),
    .(count_nm) := n()
  ) %>%
  ungroup() %>%
  arrange(.(group_nm))
## # A tibble: 2 x 3
##      am mean_hp_per_cyl count
##   <dbl>           <dbl> <int>
## 1     0            22.7    19
## 2     1            23.4    13

4 thoughts on “Parameterizing with bquote”

  1. And, of course, how to perform the same steps using wrapr::let().

    suppressPackageStartupMessages(library("dplyr"))
    
    # define our parameters
    group_nm <- as.name("am")
    num_nm <- as.name("hp")
    den_nm <- as.name("cyl")
    derived_nm <- as.name(paste0(num_nm, "_per_", den_nm))
    mean_nm <- as.name(paste0("mean_", derived_nm))
    count_nm <- as.name("count")
    
    # make a method adaptor factory
    library("wrapr")
    bq_wrap <- function(method_name, env) c(
      METHOD_NAME = method_name
    ) %in_block% {
      assign(
        method_name,
        function(.data, ...) {
          env = parent.frame()
          mc <- substitute(dplyr::METHOD_NAME(.data = .data, ...))
          mc <- do.call(bquote, list(mc, where = env), envir = env)
          eval(mc, envir = env)
        },
        envir = env)
    }
    
    # wrap some dplyr methods
    env <- environment()
    bq_wrap("mutate", env)
    bq_wrap("group_by", env)
    bq_wrap("arrange", env)
    bq_wrap("summarize", env)
    
    # use the parameterized pipeline
    mtcars %>%
      group_by(.(group_nm)) %>%
      mutate(.(derived_nm) := .(num_nm)/.(den_nm)) %>%
      summarize(
        .(mean_nm) := mean(.(derived_nm)),
        .(count_nm) := n()
      ) %>%
      ungroup() %>%
      arrange(.(group_nm))
    # # A tibble: 2 x 3
    #        am mean_hp_per_cyl count
    #     <dbl>           <dbl> <int>
    #   1     0            22.7    19
    #   2     1            23.4    13
    

    Much of the difference is let() having to work hard to get a deliberate side-effect (assigning new functions into the user’s environment), whereas strmacro‘s macro style gives it such access for free.

  2. And if you are not messing with left-hand sides of assignments, you can even use eval:

    
    library("dplyr")
    
    d <- data.frame(x = 1:2, y = 3:4, z = 5:6)
    
    NUMERATOR_COLUMN <- as.name("x")
    DENOMINATOR_COLUMN <- as.name("y")
    
    d %>% 
      mutate(rato = eval(NUMERATOR_COLUMN)/eval(DENOMINATOR_COLUMN))
    
    #   x y z      rato
    # 1 1 3 5 0.3333333
    # 2 2 4 6 0.5000000
    

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.