vtreat is an R
data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. It prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems
vtreat defends against include:
NA, too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training).
vtreat::prepare should be your first choice for real world data preparation and cleaning.
We hope this article will make getting started with
vtreat much easier. We also hope this helps with citing the use of
vtreat in scientific publications. Continue reading vtreat data cleaning and preparation article now available on arXiv