I am not sure if it is a good or bad idea. But let’s play with it a bit, and perhaps readers can submit their experience and opinions in the comments section.
Composing functions and sequencing operations are core programming concepts.
Some notable realizations of sequencing or pipelining operations include:
- CMS Pipelines.
F#‘s forward pipe operator
- Haskel’s Data.Function
The idea is: many important calculations can be considered as a sequence of transforms applied to a data set. Each step may be a function taking many arguments. It is often the case that only one of each function’s arguments is primary, and the rest are parameters. For data science applications this is particularly common, so having convenient pipeline notation can be a plus. An example of a non-trivial data processing pipeline can be found here.
R Tip: use inline operators for legibility.
- It concatenates lists:
[1,2] + is
[1, 2, 3].
- It concatenates strings:
'a' + 'b'is
And, of course, it adds numbers:
1 + 2 is
The inline notation is very convenient and legible. In this note we will show how to use a related notation
R are popular, the most popular one being
magrittr as used by
This note will discuss the advanced re-usable piping systems:
rqdatatable operator trees and
wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the
wrapr dot-arrow pipe
Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts.
wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the
vtreat data preparation system.
library("wrapr") NA %?% 0 #  0
A more substantial application is the following.
For example, to think in terms of multi-row records it helps to identify:
- Which columns are keys (together identify rows or records).
- Which columns are data/payload (are considered free varying data).
- Which columns are "derived" (functions of the keys).
In this note we will show how to use some of these ideas to write safer data-wrangling code.
Being able to use the same pipe operator for data processing steps and for
ggplot2 layering is a question that comes up from time to time (for example: Why can’t ggplot2 use %>%?). In fact the primary
ggplot2 package author wishes that
magrittr piping was the composing notation for
ggplot2 (though it is obviously too late to change).
There are some fundamental difficulties in trying to use the
magrittr pipe in such a way. In particular
magrittr looks for its own pipe by name in un-evaluated code, and thus is difficult to engineer over (though it can be hacked around). The general concept is: pipe stages are usually functions or function calls, and
ggplot2 components are objects (verbs versus nouns); and at first these seem incompatible.
wrapr dot-arrow-pipe was designed to handle such distinctions.
Let’s work an example.