`R`

users have been enjoying the benefits of `SQL`

query generators for quite some time, most notably using the `dbplyr`

package. I would like to talk about some features of our own `rquery`

query generator, concentrating on derived result re-use.

# Category: Exciting Techniques

## cdata Control Table Keys

In our `cdata`

`R`

package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys.

The user can now control which columns of a `cdata`

control table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.

## Function Objects and Pipelines in R

Composing functions and sequencing operations are core programming concepts.

Some notable realizations of sequencing or pipelining operations include:

- Unix’s
`|`

-pipe - CMS Pipelines.
`F#`

‘s forward pipe operator`|>`

.- Haskel’s Data.Function
`&`

operator. - The
`R`

`magrittr`

forward pipe. - Scikit-learn‘s
`sklearn.pipeline.Pipeline`

.

The idea is: many important calculations can be considered as a sequence of transforms applied to a data set. Each step may be a function taking many arguments. It is often the case that only one of each function’s arguments is primary, and the rest are parameters. For data science applications this is particularly common, so having convenient pipeline notation can be a plus. An example of a non-trivial data processing pipeline can be found here.

In this note we will discuss the advanced `R`

pipeline operator "dot arrow pipe" and an `S4`

class (`wrapr::UnaryFn`

) that makes working with pipeline notation much more powerful and much easier.

## Fully General Record Transforms with cdata

One of the design goals of the `cdata`

`R`

package is that very powerful and arbitrary record transforms should be convenient and take only one or two steps. In fact it is the goal to take just about any record shape to any other in two steps: first convert to row-records, then re-block the data into arbitrary record shapes (please see here and here for the concepts).

But as with all general ideas, it is much easier to see what we mean by the above with a concrete example.

## Introducing RcppDynProg

`RcppDynProg`

is a new `Rcpp`

based `R`

package that implements simple, but powerful, table-based dynamic programming. This package can be used to optimally solve the minimum cost partition into intervals problem (described below) and is useful in building piecewise estimates of functions (shown in this note).

## vtreat Variable Importance

`vtreat`

‘s purpose is to produce pure numeric `R`

`data.frame`

s that are ready for supervised predictive modeling (predicting a value from other values). By ready we mean: a purely numeric data frame with no missing values and a reasonable number of columns (missing-values re-encoded with indicators, and high-degree categorical re-encode by effects codes or impact codes).

In this note we will discuss a small aspect of the `vtreat`

package: variable screening.

## Reusable Pipelines in R

Pipelines in `R`

are popular, the most popular one being `magrittr`

as used by `dplyr`

.

This note will discuss the advanced re-usable piping systems: `rquery`

/`rqdatatable`

operator trees and `wrapr`

function object pipelines. In each case we have a set of objects designed to extract extra power from the `wrapr`

dot-arrow pipe `%.>%`

.

## Sharing Modeling Pipelines in R

## Quick Significance Calculations for A/B Tests in R

## Introduction

Let’s take a quick look at a very important and common experimental problem: checking if the difference in success rates of two Binomial experiments is statistically significant. This can arise in A/B testing situations such as online advertising, sales, and manufacturing.

We already share a free video course on a Bayesian treatment of planning and evaluating A/B tests (including a free Shiny application). Let’s now take a look at the should be simple task of simply building a summary statistic that includes a classic frequentist significance.

Continue reading Quick Significance Calculations for A/B Tests in R

## Dot-Pipe Paper Accepted by the R Journal!!!

We are thrilled to announce our (my and Nina Zumel’s) paper on the dot-pipe has been accepted by the R-Journal!

Continue reading Dot-Pipe Paper Accepted by the R Journal!!!