Posted on Categories Administrativia, Programming, StatisticsTags , Leave a comment on Binning Data in a Database

Binning Data in a Database

Roz King just wrote an interesting article on binning data (a common data analytics step) in a database. They compare a case-based approach (where the bin divisions are stuffed into code) with a join based approach. They share code and timings.

Best of all: rquery gets some attention and turns out to be the dominant solution at all scales measured.

Here is an example timing (lower times better):

NewImage

So please check the article out.

Posted on Categories Opinion, Programming, TutorialsTags , , 3 Comments on Make Teaching R Quasi-Quotation Easier

Make Teaching R Quasi-Quotation Easier

To make teaching R quasi-quotation easier it would be nice if R string-interpolation and quasi-quotation both used the same notation. They are related concepts. So some commonality of notation would actually be clarifying, and help teach the concepts. We will define both of the above terms, and demonstrate the relation between the two concepts.

Continue reading Make Teaching R Quasi-Quotation Easier

Posted on Categories Programming, TutorialsTags , , ,

R Tip: Use Inline Operators For Legibility

R Tip: use inline operators for legibility.

A Python feature I miss when working in R is the convenience of Python‘s inline + operator. In Python, + does the right thing for some built in data types:

  • It concatenates lists: [1,2] + [3] is [1, 2, 3].
  • It concatenates strings: 'a' + 'b' is 'ab'.

And, of course, it adds numbers: 1 + 2 is 3.

The inline notation is very convenient and legible. In this note we will show how to use a related notation R.

Continue reading R Tip: Use Inline Operators For Legibility

Posted on Categories ProgrammingTags , , 2 Comments on R Tip: Use seqi() For Indexes

R Tip: Use seqi() For Indexes

R Tip: use seqi() for indexing.

R‘s 1:0 trap” is a mal-feature that confuses newcomers and is a reliable source of bugs. This note will show how to use seqi() to write more reliable code and document intent.

Continue reading R Tip: Use seqi() For Indexes

Posted on Categories Programming, Statistics, Tutorials, UncategorizedTags , 4 Comments on What does it mean to write “vectorized” code in R?

What does it mean to write “vectorized” code in R?

One often hears that R can not be fast (false), or more correctly that for fast code in R you may have to consider “vectorizing.”

A lot of knowledgable R users are not comfortable with the term “vectorize”, and not really familiar with the method.

“Vectorize” is just a slightly high-handed way of saying:

R naturally stores data in columns (or in column major order), so if you are not coding to that pattern you are fighting the language.

In this article we will make the above clear by working through a non-trivial example of writing vectorized code.

Continue reading What does it mean to write “vectorized” code in R?

Posted on Categories Coding, Opinion, Programming, TutorialsTags , ,

Quoting Concatenate

In our last note we used wrapr::qe() to help quote expressions. In this note we will discuss quoting and code-capturing interfaces (interfaces that capture user source code) a bit more.

Continue reading Quoting Concatenate

Posted on Categories Coding, Exciting Techniques, Programming, TutorialsTags , ,

Reusable Pipelines in R

Pipelines in R are popular, the most popular one being magrittr as used by dplyr.

This note will discuss the advanced re-usable piping systems: rquery/rqdatatable operator trees and wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the wrapr dot-arrow pipe %.>%.

Continue reading Reusable Pipelines in R

Posted on Categories data science, Exciting Techniques, Programming, TutorialsTags , , , , , , , 2 Comments on Sharing Modeling Pipelines in R

Sharing Modeling Pipelines in R

Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the vtreat data preparation system.

Continue reading Sharing Modeling Pipelines in R

Posted on Categories Opinion, Programming, RantsTags , 2 Comments on Very Non-Standard Calling in R

Very Non-Standard Calling in R

Our group has done a lot of work with non-standard calling conventions in R.

Our tools work hard to eliminate non-standard calling (as is the purpose of wrapr::let()), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we still get surprised by some of the side-effects and mal-consequences of the over-use of non-standard calling conventions in R.

Please read on for a recent example.

Continue reading Very Non-Standard Calling in R

Posted on Categories Programming, TutorialsTags , , , 1 Comment on Quoting in R

Quoting in R

Many R users appear to be big fans of "code capturing" or "non standard evaluation" (NSE) interfaces. In this note we will discuss quoting and non-quoting interfaces in R.

Continue reading Quoting in R