Nina and I are proud to share our lecture: “Prepping Data for Analysis using R” from ODSC West 2015.
Nina Zumel and John Mount ODSC WEST 2015
It is about 90 minutes, and covers a lot of the theory behind the
vtreat data preparation library.
We also have a Github repository including all the lecture materials here.
Nina’s preview still (shown below) is one of my favorite slides. I think it really sets out ideas about how to think about novel levels (string values encountered during scoring that were not seen during training) in a nice problem driven way before getting into messy math (such as unknown frequency estimation).