Nina Zumel introduced y-aware scaling in her recent article Principal Components Regression, Pt. 2: Y-Aware Methods. I really encourage you to read the article and add the technique to your repertoire. The method combines well with other methods and can drive better predictive modeling results.
From feedback I am not sure everybody noticed that in addition to being easy and effective, the method is actually novel (we haven’t yet found an academic reference to it or seen it already in use after visiting numerous clients). Likely it has been applied before (as it is a simple method), but it is not currently considered a standard method (something we would like to change).
In this note I’ll discuss some of the context of y-aware scaling. Continue reading y-aware scaling in context
Win-Vector LLC’s Dr. Nina Zumel has a three part series on Principal Components Regression that we think is well worth your time.
- Part 1: the proper preparation of data (including scaling) and use of principal components analysis (particularly for supervised learning or regression).
- Part 2: the introduction of y-aware scaling to direct the principal components analysis to preserve variation correlated with the outcome we are trying to predict.
- Part 3: how to pick the number of components to retain for analysis.
Continue reading Why you should read Nina Zumel’s 3 part series on principal components analysis and regression
In our previous note we demonstrated Y-Aware PCA and other y-aware approaches to dimensionality reduction in a predictive modeling context, specifically Principal Components Regression (PCR). For our examples, we selected the appropriate number of principal components by eye. In this note, we will look at ways to select the appropriate number of principal components in a more automated fashion.
Continue reading Principal Components Regression, Pt. 3: Picking the Number of Components
In our previous note, we discussed some problems that can arise when using standard principal components analysis (specifically, principal components regression) to model the relationship between independent (x) and dependent (y) variables. In this note, we present some dimensionality reduction techniques that alleviate some of those problems, in particular what we call Y-Aware Principal Components Analysis, or Y-Aware PCA. We will use our variable treatment package
vtreat in the examples we show in this note, but you can easily implement the approach independently of
Continue reading Principal Components Regression, Pt. 2: Y-Aware Methods
In this note, we discuss principal components regression and some of the issues with it:
- The need for scaling.
- The need for pruning.
- The lack of “y-awareness” of the standard dimensionality reduction step.
Continue reading Principal Components Regression, Pt.1: The Standard Method
Just a “heads-up.”
I’ve been editing a
two-part three-part series Nina Zumel is writing on some of the pitfalls of improperly applied principal components analysis/regression and how to avoid them (we are using the plural spelling as used in following Everitt The Cambridge Dictionary of Statistics). The series is looking absolutely fantastic and I think it will really help people understand, properly use, and even teach the concepts.
The series includes fully worked graphical examples in R and is why we added the
ScatterHistN plot to WVPlots (plot shown below, explained in the upcoming series).
Frankly the material would have worked great as an additional chapter for Practical Data Science with R (but instead everybody is going to get it for free).
Please watch here for the series.
The complete series is now up: