In some of my recent public talks (for example: here and here) I have mentioned a desire for “a deeper theory of fitting and testing.” I thought I would expand on what I meant by this.
In this note I am going to cover a lot of different topics to try and suggest some perspective. I won’t have my usual luxury of fully defining my terms or working concrete examples. Hopefully a number of these ideas (which are related, but don’t seem to easily synthesize together) will be subjects of their own later articles.
Introduction
The focus of this article is: the true goal of predictive analytics is always: to build a model that works well in production. Training and testing procedures are designed to simulate this unknown future model performance, but can be expensive and can also fail.
What we want is a good measure of future model performance, and to apply that measure in picking a model without running deep into Goodhart’s law (“When a measure becomes a target, it ceases to be a good measure.”).
Most common training and testing procedures are destructive in the sense they use up data (data used for one step may not be safely used for another step in an unbiased fashion, example: excess generalization error). In this note I thought I would expand on the ideas for extending statistical efficiency or getting more out of your training while avoiding overfitting.

Destructive testing.
I will outline a few variations of model construction and testing techniques that one should keep in mind.