We have been writing a lot on higher-order data transforms lately:
What I want to do now is "write a bit more, so I finally feel I have been concise."
Continue reading Arbitrary Data Transforms Using cdata
I have just released some simple RStudio add-ins that are great for creating keyboard shortcuts when working with pipes in R.
You can install the add-ins from here (which also includes both installation instructions and use instructions/examples).
Just wrote a new
R article: “Data Wrangling at Scale” (using Dirk Eddelbuettel’s tint template).
Please check it out.
As part of our consulting practice Win-Vector LLC has been helping a few clients stand-up advanced analytics and machine learning stacks using
R and substantial data stores (such as relational database variants such as
PostgreSQL or big data systems such as
Often we come to a point where we or a partner realize: "the design would be a whole lot easier if we could phrase it in terms of higher order data operators."
Continue reading Big Data Transforms
My favorite advice on debugging is from Professor Norman Matloff:
Finding your bug is a process of confirming the many things that you believe are true – until you find one that is not true.
Continue reading On debugging
- Question: how hard is it to count rows using the
- Answer: surprisingly difficult.
When trying to count rows using
dplyr controlled data-structures (remote
tbls such as
dbplyr structures) one is sailing between Scylla and Charybdis. The task being to avoid
dplyr corner-cases and irregularities (a few of which I attempt to document in this "
Continue reading It is Needlessly Difficult to Count Rows Using dplyr
While working on a large client project using
Sparklyr and multinomial regression we recently ran into a problem:
Apache Spark chooses the order of multinomial regression outcome targets, whereas
R users are used to choosing the order of the targets (please see here for some details). So to make things more like
R users expect, we need a way to translate one order to another.
Providing good solutions to gaps like this is one of the thing Win-Vector LLC does both in our consulting and training practices.
Continue reading Permutation Theory In Action
seplyr has a neat new feature: the function
seplyr::expand_expr() which implements what we call “the string algebra” or string expression interpolation. The function takes an expression of mixed terms, including: variables referring to names, quoted strings, and general expression terms. It then “de-quotes” all of the variables referring to quoted strings and “dereferences” variables thought to be referring to names. The entire expression is then returned as a single string.
This provides a powerful way to easily work complicated expressions into the
seplyr data manipulation methods. Continue reading Neat New seplyr Feature: String Interpolation
wrapr is an R package that supplies powerful tools for writing and debugging R code.
Continue reading wrapr: R Code Sweeteners