Posted on

## y-aware scaling in context

Nina Zumel introduced y-aware scaling in her recent article Principal Components Regression, Pt. 2: Y-Aware Methods. I really encourage you to read the article and add the technique to your repertoire. The method combines well with other methods and can drive better predictive modeling results.

From feedback I am not sure everybody noticed that in addition to being easy and effective, the method is actually novel (we haven’t yet found an academic reference to it or seen it already in use after visiting numerous clients). Likely it has been applied before (as it is a simple method), but it is not currently considered a standard method (something we would like to change).

In this note I’ll discuss some of the context of y-aware scaling. Continue reading y-aware scaling in context

Posted on

# Short form:

Win-Vector LLC’s Dr. Nina Zumel has a three part series on Principal Components Regression that we think is well worth your time.

• Part 1: the proper preparation of data (including scaling) and use of principal components analysis (particularly for supervised learning or regression).
• Part 2: the introduction of y-aware scaling to direct the principal components analysis to preserve variation correlated with the outcome we are trying to predict.
• Part 3: how to pick the number of components to retain for analysis.
Posted on 1 Comment on Principal Components Regression, Pt. 3: Picking the Number of Components

## Principal Components Regression, Pt. 3: Picking the Number of Components

In our previous note we demonstrated Y-Aware PCA and other y-aware approaches to dimensionality reduction in a predictive modeling context, specifically Principal Components Regression (PCR). For our examples, we selected the appropriate number of principal components by eye. In this note, we will look at ways to select the appropriate number of principal components in a more automated fashion.

Posted on 2 Comments on Principal Components Regression, Pt. 2: Y-Aware Methods

## Principal Components Regression, Pt. 2: Y-Aware Methods

In our previous note, we discussed some problems that can arise when using standard principal components analysis (specifically, principal components regression) to model the relationship between independent (x) and dependent (y) variables. In this note, we present some dimensionality reduction techniques that alleviate some of those problems, in particular what we call Y-Aware Principal Components Analysis, or Y-Aware PCA. We will use our variable treatment package `vtreat` in the examples we show in this note, but you can easily implement the approach independently of `vtreat`.

Posted on 14 Comments on Principal Components Regression, Pt.1: The Standard Method

## Principal Components Regression, Pt.1: The Standard Method

In this note, we discuss principal components regression and some of the issues with it:

• The need for scaling.
• The need for pruning.
• The lack of “y-awareness” of the standard dimensionality reduction step.
Posted on 2 Comments on Coming up: principal components analysis

## Coming up: principal components analysis

The series includes fully worked graphical examples in R and is why we added the `ScatterHistN` plot to WVPlots (plot shown below, explained in the upcoming series).