Posted on Categories Coding, Practical Data Science, Pragmatic Data Science, TutorialsTags , , 4 Comments on Controlling Data Layout With cdata

Controlling Data Layout With cdata

Here is an example how easy it is to use cdata to re-layout your data.

Tim Morris recently tweeted the following problem (corrected).

Please will you take pity on me #rstats folks?
I only want to reshape two variables x & y from wide to long!

Starting with:
    d xa xb ya yb
    1  1  3  6  8
    2  2  4  7  9

How can I get to:
    id t x y
    1  a 1 6
    1  b 3 8
    2  a 2 7
    2  b 4 9
    
In Stata it's:
 . reshape long x y, i(id) j(t) string
In R, it's:
 . an hour of cursing followed by a desperate tweet 👆

Thanks for any help!

PS – I can make reshape() or gather() work when I have just x or just y.

This is not to make fun of Tim Morris: the above should be easy. Using diagrams and slowing down the data transform into small steps makes the process very easy.

Continue reading Controlling Data Layout With cdata

Posted on Categories Coding, Exciting Techniques, TutorialsTags , 6 Comments on Operator Notation for Data Transforms

Operator Notation for Data Transforms

As of cdata version 1.0.8 cdata implements an operator notation for data transform.

The idea is simple, yet powerful.

Continue reading Operator Notation for Data Transforms

Posted on Categories Coding, TutorialsTags , , 2 Comments on How cdata Control Table Data Transforms Work

How cdata Control Table Data Transforms Work

With all of the excitement surrounding cdata style control table based data transforms (the cdata ideas being named as the “replacements” for tidyr‘s current methodology, by the tidyr authors themselves!) I thought I would take a moment to describe how they work.

Continue reading How cdata Control Table Data Transforms Work

Posted on Categories Coding, Opinion, TutorialsTags , 1 Comment on More on Macros in R

More on Macros in R

Recently ran into something interesting in the R macros/quasi-quotation/substitution/syntax front:

D0FD431X0AI4pM8

Romain François: “.@_lionelhenry reveals planned double curly syntax At #satRdayParis as a possible replacement, addition to !! and enquo()”

It appears !! is no longer the last word in substitution (it certainly wasn’t the first).

Continue reading More on Macros in R

Posted on Categories Coding, TutorialsTags , , 6 Comments on Getting Started With rquery

Getting Started With rquery

To make getting started with rquery (an advanced query generator for R) easier we have re-worked the package README for various data-sources (including SparkR!).

Continue reading Getting Started With rquery

Posted on Categories Coding, OpinionTags , , , , Leave a comment on Playing With Pipe Notations

Playing With Pipe Notations

Recently Hadley Wickham prescribed pronouncing the magrittr pipe as “then” and using right-assignment as follows:

NewImage

I am not sure if it is a good or bad idea. But let’s play with it a bit, and perhaps readers can submit their experience and opinions in the comments section.

Continue reading Playing With Pipe Notations

Posted on Categories Coding, Opinion, TutorialsTags , , , 7 Comments on Timing the Same Algorithm in R, Python, and C++

Timing the Same Algorithm in R, Python, and C++

While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python.

This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read on for a summary of the results.

Continue reading Timing the Same Algorithm in R, Python, and C++

Posted on Categories Coding, Opinion, Programming, TutorialsTags , ,

Quoting Concatenate

In our last note we used wrapr::qe() to help quote expressions. In this note we will discuss quoting and code-capturing interfaces (interfaces that capture user source code) a bit more.

Continue reading Quoting Concatenate

Posted on Categories Coding, Exciting Techniques, Programming, TutorialsTags , ,

Reusable Pipelines in R

Pipelines in R are popular, the most popular one being magrittr as used by dplyr.

This note will discuss the advanced re-usable piping systems: rquery/rqdatatable operator trees and wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the wrapr dot-arrow pipe %.>%.

Continue reading Reusable Pipelines in R

Posted on Categories Coding, OpinionTags , , ,

Timing Grouped Mean Calculation in R

This note is a comment on some of the timings shared in the dplyr-0.8.0 pre-release announcement.

The original published timings were as follows:

With performance metrics: measurements are marketing. So let’s dig in the above a bit.

Continue reading Timing Grouped Mean Calculation in R