Posted on Categories data science, Exciting Techniques, TutorialsTags , Leave a comment on Query Generation in R

Query Generation in R

R users have been enjoying the benefits of SQL query generators for quite some time, most notably using the dbplyr package. I would like to talk about some features of our own rquery query generator, concentrating on derived result re-use.

Continue reading Query Generation in R

Posted on Categories Exciting Techniques, Opinion, TutorialsTags , , Leave a comment on cdata Control Table Keys

cdata Control Table Keys

In our cdata R package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys.

The user can now control which columns of a cdata control table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.

Continue reading cdata Control Table Keys

Posted on Categories data science, Exciting Techniques, TutorialsTags , , , , 1 Comment on Function Objects and Pipelines in R

Function Objects and Pipelines in R

Composing functions and sequencing operations are core programming concepts.

Some notable realizations of sequencing or pipelining operations include:

The idea is: many important calculations can be considered as a sequence of transforms applied to a data set. Each step may be a function taking many arguments. It is often the case that only one of each function’s arguments is primary, and the rest are parameters. For data science applications this is particularly common, so having convenient pipeline notation can be a plus. An example of a non-trivial data processing pipeline can be found here.

In this note we will discuss the advanced R pipeline operator "dot arrow pipe" and an S4 class (wrapr::UnaryFn) that makes working with pipeline notation much more powerful and much easier.

Continue reading Function Objects and Pipelines in R

Posted on Categories data science, Exciting Techniques, Statistics, TutorialsTags , , 1 Comment on Fully General Record Transforms with cdata

Fully General Record Transforms with cdata

One of the design goals of the cdata R package is that very powerful and arbitrary record transforms should be convenient and take only one or two steps. In fact it is the goal to take just about any record shape to any other in two steps: first convert to row-records, then re-block the data into arbitrary record shapes (please see here and here for the concepts).

But as with all general ideas, it is much easier to see what we mean by the above with a concrete example.

Continue reading Fully General Record Transforms with cdata

Posted on Categories Opinion, Programming, TutorialsTags , , 3 Comments on Make Teaching R Quasi-Quotation Easier

Make Teaching R Quasi-Quotation Easier

To make teaching R quasi-quotation easier it would be nice if R string-interpolation and quasi-quotation both used the same notation. They are related concepts. So some commonality of notation would actually be clarifying, and help teach the concepts. We will define both of the above terms, and demonstrate the relation between the two concepts.

Continue reading Make Teaching R Quasi-Quotation Easier

Posted on Categories Programming, TutorialsTags , , , Leave a comment on R Tip: Use Inline Operators For Legibility

R Tip: Use Inline Operators For Legibility

R Tip: use inline operators for legibility.

A Python feature I miss when working in R is the convenience of Python‘s inline + operator. In Python, + does the right thing for some built in data types:

  • It concatenates lists: [1,2] + [3] is [1, 2, 3].
  • It concatenates strings: 'a' + 'b' is 'ab'.

And, of course, it adds numbers: 1 + 2 is 3.

The inline notation is very convenient and legible. In this note we will show how to use a related notation R.

Continue reading R Tip: Use Inline Operators For Legibility

Posted on Categories Mathematics, Opinion, TutorialsTags , , Leave a comment on A Beautiful 2 by 2 Matrix Identity

A Beautiful 2 by 2 Matrix Identity

While working on a variation of the RcppDynProg algorithm we derived the following beautiful identity of 2 by 2 real matrices:

The superscript “top” denoting the transpose operation, the ||.||^2_2 denoting sum of squares norm, and the single |.| denoting determinant.

This is derived from one of the check equations for the Moore–Penrose inverse and we have details of the derivation here, and details of the messy algebra here.

Posted on Categories Coding, Opinion, TutorialsTags , , , 7 Comments on Timing the Same Algorithm in R, Python, and C++

Timing the Same Algorithm in R, Python, and C++

While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python.

This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read on for a summary of the results.

Continue reading Timing the Same Algorithm in R, Python, and C++

Posted on Categories Programming, Statistics, Tutorials, UncategorizedTags , 4 Comments on What does it mean to write “vectorized” code in R?

What does it mean to write “vectorized” code in R?

One often hears that R can not be fast (false), or more correctly that for fast code in R you may have to consider “vectorizing.”

A lot of knowledgable R users are not comfortable with the term “vectorize”, and not really familiar with the method.

“Vectorize” is just a slightly high-handed way of saying:

R naturally stores data in columns (or in column major order), so if you are not coding to that pattern you are fighting the language.

In this article we will make the above clear by working through a non-trivial example of writing vectorized code.

Continue reading What does it mean to write “vectorized” code in R?

Posted on Categories Exciting Techniques, math programming, TutorialsTags , , 3 Comments on Introducing RcppDynProg

Introducing RcppDynProg

RcppDynProg is a new Rcpp based R package that implements simple, but powerful, table-based dynamic programming. This package can be used to optimally solve the minimum cost partition into intervals problem (described below) and is useful in building piecewise estimates of functions (shown in this note).

Continue reading Introducing RcppDynProg