Roz King just wrote an interesting article on binning data (a common data analytics step) in a database. They compare a case-based approach (where the bin divisions are stuffed into code) with a join based approach. They share code and timings.
Best of all:
rquery gets some attention and turns out to be the dominant solution at all scales measured.
Here is an example timing (lower times better):
So please check the article out.
To make teaching
R quasi-quotation easier it would be nice if
R string-interpolation and quasi-quotation both used the same notation. They are related concepts. So some commonality of notation would actually be clarifying, and help teach the concepts. We will define both of the above terms, and demonstrate the relation between the two concepts.
Continue reading Make Teaching R Quasi-Quotation Easier
R Tip: use inline operators for legibility.
Python feature I miss when working in
R is the convenience of
+ operator. In
+ does the right thing for some built in data types:
- It concatenates lists:
[1,2] +  is
[1, 2, 3].
- It concatenates strings:
'a' + 'b' is
And, of course, it adds numbers:
1 + 2 is
The inline notation is very convenient and legible. In this note we will show how to use a related notation
Continue reading R Tip: Use Inline Operators For Legibility
R Tip: use
seqi() for indexing.
1:0 trap” is a mal-feature that confuses newcomers and is a reliable source of bugs. This note will show how to use
seqi() to write more reliable code and document intent.
Continue reading R Tip: Use seqi() For Indexes
One often hears that
R can not be fast (false), or more correctly that for fast code in
R you may have to consider “vectorizing.”
A lot of knowledgable
R users are not comfortable with the term “vectorize”, and not really familiar with the method.
“Vectorize” is just a slightly high-handed way of saying:
R naturally stores data in columns (or in column major order), so if you are not coding to that pattern you are fighting the language.
In this article we will make the above clear by working through a non-trivial example of writing vectorized code.
Continue reading What does it mean to write “vectorized” code in R?
In our last note we used
wrapr::qe() to help quote expressions. In this note we will discuss quoting and code-capturing interfaces (interfaces that capture user source code) a bit more.
Continue reading Quoting Concatenate
R are popular, the most popular one being
magrittr as used by
This note will discuss the advanced re-usable piping systems:
rqdatatable operator trees and
wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the
wrapr dot-arrow pipe
Continue reading Reusable Pipelines in R
Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts.
wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the
vtreat data preparation system.
Continue reading Sharing Modeling Pipelines in R
Our group has done a lot of work with non-standard calling conventions in
Our tools work hard to eliminate non-standard calling (as is the purpose of
wrapr::let()), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we still get surprised by some of the side-effects and mal-consequences of the over-use of non-standard calling conventions in
Please read on for a recent example.
Continue reading Very Non-Standard Calling in R
R users appear to be big fans of "code capturing" or "non standard evaluation" (NSE) interfaces. In this note we will discuss quoting and non-quoting interfaces in
Continue reading Quoting in R