As of `cdata`

version `1.0.8`

`cdata`

implements an operator notation for data transform.

The idea is simple, yet powerful.

Skip to content
# Category: Exciting Techniques

Posted on Categories Coding, Exciting Techniques, Tutorials6 Comments on Operator Notation for Data Transforms## Operator Notation for Data Transforms

Posted on Categories Exciting Techniques, Tutorials1 Comment on “If You Were an R Function, What Function Would You Be?”## “If You Were an R Function, What Function Would You Be?”

Posted on Categories data science, Exciting Techniques, TutorialsLeave a comment on Query Generation in R## Query Generation in R

Posted on Categories Exciting Techniques, Opinion, TutorialsLeave a comment on cdata Control Table Keys## cdata Control Table Keys

Posted on Categories data science, Exciting Techniques, Tutorials1 Comment on Function Objects and Pipelines in R## Function Objects and Pipelines in R

Posted on Categories data science, Exciting Techniques, Statistics, Tutorials1 Comment on Fully General Record Transforms with cdata## Fully General Record Transforms with cdata

Posted on Categories Exciting Techniques, math programming, Tutorials3 Comments on Introducing RcppDynProg## Introducing RcppDynProg

Posted on Categories data science, Exciting Techniques, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, Tutorials## vtreat Variable Importance

Posted on Categories Coding, Exciting Techniques, Programming, Tutorials## Reusable Pipelines in R

Posted on Categories data science, Exciting Techniques, Programming, Tutorials2 Comments on Sharing Modeling Pipelines in R## Sharing Modeling Pipelines in R

As of `cdata`

version `1.0.8`

`cdata`

implements an operator notation for data transform.

The idea is simple, yet powerful.

We’ve been getting some good uptake on our piping in `R`

article announcement.

The article is necessarily a bit technical. But one of its key points comes from the observation that piping into names is a special opportunity to give general objects the following personality quiz: “If you were an `R`

function, what function would you be?”

Continue reading “If You Were an R Function, What Function Would You Be?”

In our `cdata`

`R`

package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys.

The user can now control which columns of a `cdata`

control table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.

Composing functions and sequencing operations are core programming concepts.

Some notable realizations of sequencing or pipelining operations include:

- Unix’s
`|`

-pipe - CMS Pipelines.
`F#`

‘s forward pipe operator`|>`

.- Haskel’s Data.Function
`&`

operator. - The
`R`

`magrittr`

forward pipe. - Scikit-learn‘s
`sklearn.pipeline.Pipeline`

.

The idea is: many important calculations can be considered as a sequence of transforms applied to a data set. Each step may be a function taking many arguments. It is often the case that only one of each function’s arguments is primary, and the rest are parameters. For data science applications this is particularly common, so having convenient pipeline notation can be a plus. An example of a non-trivial data processing pipeline can be found here.

In this note we will discuss the advanced `R`

pipeline operator "dot arrow pipe" and an `S4`

class (`wrapr::UnaryFn`

) that makes working with pipeline notation much more powerful and much easier.

One of the design goals of the `cdata`

`R`

package is that very powerful and arbitrary record transforms should be convenient and take only one or two steps. In fact it is the goal to take just about any record shape to any other in two steps: first convert to row-records, then re-block the data into arbitrary record shapes (please see here and here for the concepts).

But as with all general ideas, it is much easier to see what we mean by the above with a concrete example.

`RcppDynProg`

is a new `Rcpp`

based `R`

package that implements simple, but powerful, table-based dynamic programming. This package can be used to optimally solve the minimum cost partition into intervals problem (described below) and is useful in building piecewise estimates of functions (shown in this note).

`vtreat`

‘s purpose is to produce pure numeric `R`

`data.frame`

s that are ready for supervised predictive modeling (predicting a value from other values). By ready we mean: a purely numeric data frame with no missing values and a reasonable number of columns (missing-values re-encoded with indicators, and high-degree categorical re-encode by effects codes or impact codes).

In this note we will discuss a small aspect of the `vtreat`

package: variable screening.

Pipelines in `R`

are popular, the most popular one being `magrittr`

as used by `dplyr`

.

This note will discuss the advanced re-usable piping systems: `rquery`

/`rqdatatable`

operator trees and `wrapr`

function object pipelines. In each case we have a set of objects designed to extract extra power from the `wrapr`

dot-arrow pipe `%.>%`

.