R are popular, the most popular one being
magrittr as used by
This note will discuss the advanced re-usable piping systems:
rqdatatable operator trees and
wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the
wrapr dot-arrow pipe
Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts.
wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the
vtreat data preparation system.
This note is a comment on some of the timings shared in the dplyr-0.8.0 pre-release announcement.
The original published timings were as follows:
With performance metrics: measurements are marketing. So let’s dig in the above a bit.
Our group has done a lot of work with non-standard calling conventions in
Our tools work hard to eliminate non-standard calling (as is the purpose of
wrapr::let()), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we still get surprised by some of the side-effects and mal-consequences of the over-use of non-standard calling conventions in
Please read on for a recent example.
This note is just a quick follow-up to our last note on correcting the bias in estimated standard deviations for binomial experiments.
This note is about attempting to remove the bias brought in by using sample standard deviation estimates to estimate an unknown true standard deviation of a population. We establish there is a bias, concentrate on why it is not important to remove it for reasonable sized samples, and (despite that) give a very complete bias management solution.
R is designed to make working with statistical models fast, succinct, and reliable.
For instance building a model is a one-liner:
model <- lm(Petal.Length ~ Sepal.Length, data = iris)
And producing a detailed diagnostic summary of the model is also a one-liner:
summary(model) # Call: # lm(formula = Petal.Length ~ Sepal.Length, data = iris) # # Residuals: # Min 1Q Median 3Q Max # -2.47747 -0.59072 -0.00668 0.60484 2.49512 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) -7.10144 0.50666 -14.02 <2e-16 *** # Sepal.Length 1.85843 0.08586 21.65 <2e-16 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 0.8678 on 148 degrees of freedom # Multiple R-squared: 0.76, Adjusted R-squared: 0.7583 # F-statistic: 468.6 on 1 and 148 DF, p-value: < 2.2e-16
However, useful as the above is: it isn’t exactly presentation ready. To formally report the R-squared of our model we would have to cut and paste this information from the summary. That is a needlessly laborious and possibly error-prone step.
sigr package this can be made much easier:
library("sigr") Rsquared <- wrapFTest(model) print(Rsquared) #  "F Test summary: (R2=0.76, F(1,148)=468.6, p<1e-05)."
And this formal summary can be directly rendered into many formats (Latex, html, markdown, and ascii).
F Test summary: (R2=0.76, F(1,148)=468.6, p<1e-05).
sigr can help make your publication workflow much easier and more repeatable/reliable.
library("wrapr") NA %?% 0 #  0
A more substantial application is the following.