To make teaching `R`

quasi-quotation easier it would be nice if `R`

string-interpolation and quasi-quotation both used the same notation. They are related concepts. So some commonality of notation would actually be clarifying, and help teach the concepts. We will define both of the above terms, and demonstrate the relation between the two concepts.

# Category: Programming

## R Tip: Use Inline Operators For Legibility

`R`

Tip: use inline operators for legibility.

A `Python`

feature I miss when working in `R`

is the convenience of `Python`

‘s inline `+`

operator. In `Python`

, `+`

does the right thing for some built in data types:

- It concatenates lists:
`[1,2] + [3]`

is`[1, 2, 3]`

. - It concatenates strings:
`'a' + 'b'`

is`'ab'`

.

And, of course, it adds numbers: `1 + 2`

is `3`

.

The inline notation is very convenient and legible. In this note we will show how to use a related notation `R`

.

## R Tip: Use seqi() For Indexes

`R`

Tip: use `seqi()`

for indexing.

`R`

‘s “`1:0`

trap” is a mal-feature that confuses newcomers and is a reliable source of bugs. This note will show how to use `seqi()`

to write more reliable code and document intent.

## What does it mean to write “vectorized” code in R?

One often hears that `R`

can not be fast (false), or more correctly that for fast code in `R`

you may have to consider “vectorizing.”

A lot of knowledgable `R`

users are not comfortable with the term “vectorize”, and not really familiar with the method.

“Vectorize” is just a slightly high-handed way of saying:

`R`

naturally stores data in columns (or in column major order), so if you are not coding to that pattern you are fighting the language.

In this article we will make the above clear by working through a non-trivial example of writing vectorized code.

Continue reading What does it mean to write “vectorized” code in R?

## Quoting Concatenate

In our last note we used `wrapr::qe()`

to help quote expressions. In this note we will discuss quoting and code-capturing interfaces (interfaces that capture user source code) a bit more.

## Reusable Pipelines in R

Pipelines in `R`

are popular, the most popular one being `magrittr`

as used by `dplyr`

.

This note will discuss the advanced re-usable piping systems: `rquery`

/`rqdatatable`

operator trees and `wrapr`

function object pipelines. In each case we have a set of objects designed to extract extra power from the `wrapr`

dot-arrow pipe `%.>%`

.

## Sharing Modeling Pipelines in R

## Very Non-Standard Calling in R

Our group has done a *lot* of work with non-standard calling conventions in `R`

.

Our tools work hard to *eliminate* non-standard calling (as is the purpose of `wrapr::let()`

), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we *still* get surprised by some of the side-effects and mal-consequences of the over-use of non-standard calling conventions in `R`

.

Please read on for a recent example.

## Quoting in R

Many `R`

users appear to be big fans of "code capturing" or "non standard evaluation" (NSE) interfaces. In this note we will discuss quoting and non-quoting interfaces in `R`

.

## More on sigr

If you’ve read our previous R Tip on using sigr with linear models, you might have noticed that the `lm()`

summary object does in fact carry the R-squared and F statistics, both in the printed form:

model_lm <- lm(formula = Petal.Length ~ Sepal.Length, data = iris) (smod_lm <- summary(model_lm)) ## ## Call: ## lm(formula = Petal.Length ~ Sepal.Length, data = iris) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.47747 -0.59072 -0.00668 0.60484 2.49512 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -7.10144 0.50666 -14.02 <2e-16 *** ## Sepal.Length 1.85843 0.08586 21.65 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.8678 on 148 degrees of freedom ## Multiple R-squared: 0.76, Adjusted R-squared: 0.7583 ## F-statistic: 468.6 on 1 and 148 DF, p-value: < 2.2e-16

and also in the `summary()`

object:

c(R2 = smod_lm$r.squared, F = smod_lm$fstatistic[1]) ## R2 F.value ## 0.7599546 468.5501535

Note, though, that while the summary *reports* the model’s significance, it does not carry it as a specific `summary()`

object item. `sigr::wrapFTest()`

is a convenient way to extract the model’s R-squared and F statistic *and* simultaneously calculate the model significance, as is required by many scientific publications.

`sigr`

is even more helpful for logistic regression, via `glm()`

, which reports neither the model’s chi-squared statistic nor its significance.

iris$isVersicolor <- iris$Species == "versicolor" model_glm <- glm( isVersicolor ~ Sepal.Length + Sepal.Width, data = iris, family = binomial) (smod_glm <- summary(model_glm)) ## ## Call: ## glm(formula = isVersicolor ~ Sepal.Length + Sepal.Width, family = binomial, ## data = iris) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.9769 -0.8176 -0.4298 0.8855 2.0855 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 8.0928 2.3893 3.387 0.000707 *** ## Sepal.Length 0.1294 0.2470 0.524 0.600247 ## Sepal.Width -3.2128 0.6385 -5.032 4.85e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 190.95 on 149 degrees of freedom ## Residual deviance: 151.65 on 147 degrees of freedom ## AIC: 157.65 ## ## Number of Fisher Scoring iterations: 5

To get the significance of a logistic regression model, call `wrapr::wrapChiSqTest():`

library(sigr) (chi2Test <- wrapChiSqTest(model_glm)) ## [1] “Chi-Square Test summary: pseudo-R2=0.21 (X2(2,N=150)=39, p<1e-05).”

Notice that the fit summary also reports a pseudo-R-squared. You can extract the values directly off the `sigr`

object, as well:

str(chi2Test) ## List of 10 ## $ test : chr "Chi-Square test" ## $ df.null : int 149 ## $ df.residual : int 147 ## $ null.deviance : num 191 ## $ deviance : num 152 ## $ pseudoR2 : num 0.206 ## $ pValue : num 2.92e-09 ## $ sig : num 2.92e-09 ## $ delta_deviance: num 39.3 ## $ delta_df : int 2 ## - attr(*, "class")= chr [1:2] "sigr_chisqtest" "sigr_statistic"

And of course you can render the `sigr`

object into one of several formats (Latex, html, markdown, and ascii) for direct inclusion in a report or publication.

render(chi2Test, format = "html")

**Chi-Square Test** summary: *pseudo- R^{2}*=0.21 (

*χ*(2,

^{2}*N*=150)=39,

*p*<1e-05).

By the way, if you are interested, we give the explicit formula for calculating the significance of a logistic regression model in *Practical Data Science with R*.