Posted on Categories AdministrativiaTags , , ,

A bit more on testing

If you liked Nina Zumel’s article on the limitations of Random Test/Train splits you might want to check out her recent article on predictive analytics product evaluation hosted by our friends at Fliptop. Continue reading A bit more on testing

Posted on Categories data science, Opinion, Pragmatic Data Science, Pragmatic Machine Learning, Rants, StatisticsTags , , , , , 4 Comments on Unprincipled Component Analysis

Unprincipled Component Analysis

As a data scientist I have seen variations of principal component analysis and factor analysis so often blindly misapplied and abused that I have come to think of the technique as unprincipled component analysis. PCA is a good technique often used to reduce sensitivity to overfitting. But this stated design intent leads many to (falsely) believe that any claimed use of PCA prevents overfit (which is not always the case). In this note we comment on the intent of PCA like techniques, common abuses and other options.

The idea is to illustrate what can quietly go wrong in an analysis and what tests to perform to make sure you see the issue. The main point is some analysis issues can not be fixed without going out and getting more domain knowledge, more variables or more data. You can’t always be sure that you have insufficient data in your analysis (there is always a worry that some clever technique will make the current data work), but it must be something you are prepared to consider. Continue reading Unprincipled Component Analysis