Posted on Categories Administrativia, Exciting Techniques, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags , , , ,

Big cdata News

I have some big news about our R package cdata. We have greatly improved the calling interface and Nina Zumel has just written the definitive introduction to cdata.

cdata is our general coordinatized data tool. It is what powers the deep learning performance graph (here demonstrated with R and Keras) that I announced a while ago.


However, cdata is much more than that.

cdata provides a family of general transforms that include pivot/unpivot (or tidyr::spread/tidyr::gather) as easy special cases. Nina refused to write the article on it until we re-factored the api to be even more teachable (and therefore more learnable, and more useful). After her re-design (adding the concepts of both concrete records and abstract records to the coordinatized data theory) the system teaches itself. It is actually hard to remember you are graphically specifying potentially involved, difficult, and confusing transforms (which do not remain confusing as the graphical specification becomes its own documenting diagram!).

Don’t take my word for it, please checkout Nina’s article: “Fluid data reshaping with cdata”.

Also, I will be presenting a lightening talk on cdata at the January 2018 BARUG Meetup! We think we have gotten the concepts and package refined and polished to the point where it can be mastered in 15 minutes with time to spare.

As a bonus this new version of cdata is now available on CRAN (with the old method naming still supported), and also works at big data scale (for DBI-adaptable databases such as Spark and PostgreSQL, an uncommon feature for a full featured pivot/un-pivot system).

Leave a Reply