Posted on Categories data science, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags , ,

cdata Update

The R package cdata now has version 0.7.0 available from CRAN.

cdata is a data manipulation package that subsumes many higher order data manipulation operations including pivot/un-pivot, spread/gather, or cast/melt. The record to record transforms are specified by drawing a table that expresses the record structure (called the “control table” and also the link between the key concepts of row-records and block-records).

What can be quickly specified and achieved using these concepts and notations is amazing and quite teachable. These transforms can be run in-memory or in remote database or big-data systems (such as Spark).

The concepts are taught in Nina Zumel’s excellent tutorial.


Untitled

And in John Mount’s quick screencast/lecture.

link, slides

The 0.7.0 update adds local versions of the operators in addition to the Spark and database implementations. These methods should now be a bit safer for in-memory complex/annotated types such as dates and times.

Posted on Categories Coding, data science, Exciting Techniques, Programming, Statistics, TutorialsTags , , ,

Wanted: cdata Test Pilots

I need a few volunteers to please “test pilot” the development version of the R package cdata, please.

Jackie Cochran at 1938 Bendix Race
Jacqueline Cochran: at the time of her death, no other pilot held more speed, distance, or altitude records in aviation history than Cochran.

Continue reading Wanted: cdata Test Pilots

Posted on Categories Administrativia, data science, Exciting Techniques, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , ,

Data Reshaping with cdata

I’ve just shared a short webcast on data reshaping in R using the cdata package.

(link)

We also have two really nifty articles on the theory and methods:

Please give it a try!

This is the material I recently presented at the January 2017 BARUG Meetup.

NewImage

Posted on Categories Administrativia, Exciting Techniques, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags , , , ,

Big cdata News

I have some big news about our R package cdata. We have greatly improved the calling interface and Nina Zumel has just written the definitive introduction to cdata.

cdata is our general coordinatized data tool. It is what powers the deep learning performance graph (here demonstrated with R and Keras) that I announced a while ago.

KerasPlot

However, cdata is much more than that.

Continue reading Big cdata News

Posted on Categories data science, Pragmatic Data Science, Programming, Statistics, TutorialsTags , , 3 Comments on Arbitrary Data Transforms Using cdata

Arbitrary Data Transforms Using cdata

We have been writing a lot on higher-order data transforms lately:

Cdata

What I want to do now is "write a bit more, so I finally feel I have been concise."

Continue reading Arbitrary Data Transforms Using cdata

Posted on Categories Pragmatic Data Science, Pragmatic Machine Learning, Programming, Statistics, TutorialsTags , , , , , , ,

Data Wrangling at Scale

Just wrote a new R article: “Data Wrangling at Scale” (using Dirk Eddelbuettel’s tint template).

Fd

Please check it out.

Posted on Categories Administrativia, data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags , , , , , , , 1 Comment on More documentation for Win-Vector R packages

More documentation for Win-Vector R packages

The Win-Vector public R packages now all have new pkgdown documentation sites! (And, a thank-you to Hadley Wickham for developing the pkgdown tool.)

Please check them out (hint: vtreat is our favorite).

NewImage Continue reading More documentation for Win-Vector R packages