Posted on Categories data science, StatisticsTags ,

Latest vtreat up on CRAN

There is a new version of the R package vtreat now up on CRAN.

vtreat is an essential data preparation system for predictive modeling that helps defend your predictive modeling work against real world data issues including:

  • High cardinality categorical variables
  • Rare levels (including new or novel levels during application) in categorical variables
  • Missing data (random or systematic)
  • Irrelevant variables/columns
  • Nested model bias, and other over-fit issues.

vtreat also includes excellent, citable, documentation: vtreat: a data.frame Processor for Predictive Modeling.

For this release I want to thank everybody who generously donated their time to submit an issue or build a git pull-request. In particular:

  • Vadim Khotilovich, who found and fixed a major performance problem in the y-stratified sampling.
  • Lawrence Wu, who has been donating documentation fixes.
  • Peter Hurford, who has been donating documentation fixes.

Leave a Reply