Posted on Categories Expository Writing, Pragmatic Machine Learning, Statistics, Statistics To English TranslationTags , , , , 1 Comment on Correlation and R-Squared

Correlation and R-Squared

What is R2? In the context of predictive models (usually linear regression), where y is the true outcome, and f is the model’s prediction, the definition that I see most often is:

4471BBA8-E9DB-4D30-A9AE-A74F8C773247.jpg

In words, R2 is a measure of how much of the variance in y is explained by the model, f.

Under “general conditions”, as Wikipedia says, R2 is also the square of the correlation (correlation written as a “p” or “rho”) between the actual and predicted outcomes:

A4311540-8DFB-45FB-93F7-65E7B72AE6C8.jpg

I prefer the “squared correlation” definition, as it gets more directly at what is usually my primary concern: prediction. If R2 is close to one, then the model’s predictions mirror true outcome, tightly. If R2 is low, then either the model does not mirror true outcome, or it only mirrors it loosely: a “cloud” that — hopefully — is oriented in the right direction. Of course, looking at the graph always helps:

R2_compare.png

The question we will address here is : how do you get from R2 to correlation?

Continue reading Correlation and R-Squared

Posted on Categories Computer Science, Exciting Techniques, Expository Writing, math programming, Opinion, TutorialsTags , , 1 Comment on An Appreciation of Locality Sensitive Hashing

An Appreciation of Locality Sensitive Hashing

We share our admiration for a set of results called “locality sensitive hashing” by demonstrating a greatly simplified example that exhibits the spirit of the techniques. Continue reading An Appreciation of Locality Sensitive Hashing