Posted on Categories data science, Pragmatic Data Science, Programming, Statistics, TutorialsTags , , 3 Comments on Arbitrary Data Transforms Using cdata

Arbitrary Data Transforms Using cdata

We have been writing a lot on higher-order data transforms lately:

Cdata

What I want to do now is "write a bit more, so I finally feel I have been concise."

Continue reading Arbitrary Data Transforms Using cdata

Posted on Categories Programming, Statistics, TutorialsTags , , , , , , 2 Comments on RStudio Keyboard Shortcuts for Pipes

RStudio Keyboard Shortcuts for Pipes

I have just released some simple RStudio add-ins that are great for creating keyboard shortcuts when working with pipes in R.

You can install the add-ins from here (which also includes both installation instructions and use instructions/examples).

RStudio Logo Blue Gradient

Wraprs BizarroPipe Logo

Posted on Categories Pragmatic Data Science, Pragmatic Machine Learning, Programming, Statistics, TutorialsTags , , , , , , , Leave a comment on Data Wrangling at Scale

Data Wrangling at Scale

Just wrote a new R article: “Data Wrangling at Scale” (using Dirk Eddelbuettel’s tint template).

Fd

Please check it out.

Posted on Categories Coding, data science, Pragmatic Data Science, Programming, Statistics, TutorialsTags , , 1 Comment on Big Data Transforms

Big Data Transforms

As part of our consulting practice Win-Vector LLC has been helping a few clients stand-up advanced analytics and machine learning stacks using R and substantial data stores (such as relational database variants such as PostgreSQL or big data systems such as Spark).


IMG 6061 3

Often we come to a point where we or a partner realize: "the design would be a whole lot easier if we could phrase it in terms of higher order data operators."

Continue reading Big Data Transforms

Posted on Categories Opinion, Programming, TutorialsTags , Leave a comment on On debugging

On debugging

My favorite advice on debugging is from Professor Norman Matloff:

Finding your bug is a process of confirming the many things that you believe are true – until you find one that is not true.


LeafInsect

Continue reading On debugging

Posted on Categories Opinion, Programming, StatisticsTags , , , , 2 Comments on It is Needlessly Difficult to Count Rows Using dplyr

It is Needlessly Difficult to Count Rows Using dplyr

  • Question: how hard is it to count rows using the R package dplyr?
  • Answer: surprisingly difficult.

When trying to count rows using dplyr or dplyr controlled data-structures (remote tbls such as Sparklyr or dbplyr structures) one is sailing between Scylla and Charybdis. The task being to avoid dplyr corner-cases and irregularities (a few of which I attempt to document in this "dplyr inferno").



800px Johann Heinrich Füssli 054

Continue reading It is Needlessly Difficult to Count Rows Using dplyr

Posted on Categories data science, Pragmatic Data Science, Pragmatic Machine Learning, Programming, Statistics, TutorialsTags , , , , , Leave a comment on Permutation Theory In Action

Permutation Theory In Action

While working on a large client project using Sparklyr and multinomial regression we recently ran into a problem: Apache Spark chooses the order of multinomial regression outcome targets, whereas R users are used to choosing the order of the targets (please see here for some details). So to make things more like R users expect, we need a way to translate one order to another.

Providing good solutions to gaps like this is one of the thing Win-Vector LLC does both in our consulting and training practices.

Continue reading Permutation Theory In Action

Posted on Categories Exciting Techniques, Programming, Statistics, TutorialsTags , , , 1 Comment on Neat New seplyr Feature: String Interpolation

Neat New seplyr Feature: String Interpolation

The R package seplyr has a neat new feature: the function seplyr::expand_expr() which implements what we call “the string algebra” or string expression interpolation. The function takes an expression of mixed terms, including: variables referring to names, quoted strings, and general expression terms. It then “de-quotes” all of the variables referring to quoted strings and “dereferences” variables thought to be referring to names. The entire expression is then returned as a single string.


Safety

This provides a powerful way to easily work complicated expressions into the seplyr data manipulation methods. Continue reading Neat New seplyr Feature: String Interpolation

Posted on Categories Programming, Statistics, TutorialsTags , , , , , , 6 Comments on Some Neat New R Notations

Some Neat New R Notations

The R package wrapr supplies a few neat new coding notations.


abacus

An Abacus, which gives us the term “calculus.”

Continue reading Some Neat New R Notations