Posted on Categories Administrativia, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , ,

More on preparing data

The Microsoft Data Science User Group just sponsored Dr. Nina Zumel‘s presentation “Preparing Data for Analysis Using R”. Microsoft saw Win-Vector LLC‘s ODSC West 2015 presentation “Prepping Data for Analysis using R” and generously offered to sponsor improving it and disseminating it to a wider audience.


We feel Nina really hit the ball out of the park with over 400 new live viewers. Read more for links to even more free materials!

Microsoft has generously sponsored the following:

These are really great materials and we will be promoting and distributing them widely.

Nina emphasized teaching the principles of data treatment and cleaning (frankly an under-emphasized task). She also mentioned a free R library supplied by Win-Vector LLC: vtreat that automates a great number of the steps in a principled and statistically sound manner. Because her lecture is likely to attract more interest in the vtreat library: we have tuned up the vtreat documentation a bit and made it available as pre-rendered HTML (in addition to the normal vignette distribution). Of particular interest we have finally enumerated all the variable types that vtreat uses to re-encode your data.