Please check it out.
If you work with
R and data, now is the time to check out the
cdata package. Continue reading Update on coordinatized or fluid data
- Random Test/Train Split is not Always Enough
- How Do You Know if Your Data Has Signal?
- How do you know if your model is going to work?
- A Simpler Explanation of Differential Privacy (explaining the reusable holdout set)
- Using differential privacy to reuse training data
- Preparing Data for Analysis using R: Basic through Advanced Techniques
What stands out in these presentations is: the simple practice of a static test/train split is merely a convenience to cut down on operational complexity and difficulty of teaching. It is in no way optimal. That is, using slightly more complicated procedures can build better models on a given set of data.
Suggested static cal/train/test experiment design from vtreat data treatment library.