Posted on Categories Expository Writing, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , 2 Comments on On calculating AUC

On calculating AUC

Recently Microsoft Data Scientist Bob Horton wrote a very nice article on ROC plots. We expand on this a bit and discuss some of the issues in computing “area under the curve” (AUC). Continue reading On calculating AUC

Posted on Categories Mathematics, Opinion, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , ,

A bit on the F1 score floor

At Strata+Hadoop World “R Day” Tutorial, Tuesday, March 29 2016, San Jose, California we spent some time on classifier measures derived from the so-called “confusion matrix.”

We repeated our usual admonition to not use “accuracy itself” as a project quality goal (business people tend to ask for it as it is the word they are most familiar with, but it usually isn’t what they really want).


NewImage
One reason not to use accuracy: an example where a classifier that does nothing is “more accurate” than one that actually has some utility. (Figure credit Nina Zumel, slides here)

And we worked through the usual bestiary of other metrics (precision, recall, sensitivity, specificity, AUC, balanced accuracy, and many more).

Please read on to see what stood out. Continue reading A bit on the F1 score floor

Posted on Categories Opinion, Statistics, TutorialsTags , , , , 2 Comments on More on ROC/AUC

More on ROC/AUC

A bit more on the ROC/AUC

The issue

The receiver operating characteristic curve (or ROC) is one of the standard methods to evaluate a scoring system. Nina Zumel has described its application, but I would like to call out some additional details. In my opinion while the ROC is a useful tool, the “area under the curve” (AUC) summary often read off it is not as intuitive and interpretable as one would hope or some writers assert.

Continue reading More on ROC/AUC