Posted on Categories Pragmatic Data Science, Pragmatic Machine Learning, Programming, Statistics, TutorialsTags , , , , , , , Leave a comment on Data Wrangling at Scale

Data Wrangling at Scale

Just wrote a new R article: “Data Wrangling at Scale” (using Dirk Eddelbuettel’s tint template).

Fd

Please check it out.

Posted on Categories Administrativia, Statistics, TutorialsTags , , , , , 8 Comments on Update on coordinatized or fluid data

Update on coordinatized or fluid data

We have just released a major update of the cdata R package to CRAN.

Cdata

If you work with R and data, now is the time to check out the cdata package. Continue reading Update on coordinatized or fluid data

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , ,

Fluid use of data

Nina Zumel and I recently wrote a few article and series on best practices in testing models and data:

What stands out in these presentations is: the simple practice of a static test/train split is merely a convenience to cut down on operational complexity and difficulty of teaching. It is in no way optimal. That is, using slightly more complicated procedures can build better models on a given set of data.


CalTrainTest
Suggested static cal/train/test experiment design from vtreat data treatment library.
Continue reading Fluid use of data