All posts by John Mount

How sure are you that large margin implies low VC dimension?

How sure are you that large margin implies low VC dimension (and good generalization error)? It is true. But even if you have taken a good course on machine learning you many have seen the actual proof (with all of the caveats and conditions). I worked through the literature proofs over the holiday and it took a lot of notes to track what is really going on in the derivation of the support vector machine.

Figure: the standard SVM margin diagram, this time with some un-marked data added.
Continue reading How sure are you that large margin implies low VC dimension?

Soft margin is not as good as hard margin

This note is a link to an excerpt from my upcoming monster support vector machine article where I work through a number of sections of [Vapnik, 1998] Vapnik, V. N. (1998), Statistical Learning Theory, Wiley. I try to run down how the original theoretical claims are precisely linked to what is said about the common implementations. The write-up is fairly technical and very large (26 pages).

Here we are extracting an appendix: “Soft margin is not as good as hard margin.” In it we build a toy problem that is not large-margin separated and note that if the dimension of the concept space you were working in was not obvious (i.e. you were forced to rely on the margin derived portion of generalization bounds) then generalization improvement for a soft margin SVM is much slower than you would expect given experience from the hard margin theorems. The punch-line is: every time you get eight times as much training data you only halve your expected excess generalization error bound (whereas once you get below a data-set’s hard-margin bound you expect one to one reduction of the bound with respect to training data set size). What this points out is: the soft margin idea can simulate margin, but it comes at a cost. Continue reading Soft margin is not as good as hard margin

R bracket is a bit irregular

While skimming Professor Hadley Wickham’s Advanced R I got to thinking about nature of the square-bracket or extract operator in R. It turns out “[,]” is a bit more irregular than I remembered.

The subsetting section of Advanced R has a very good discussion on the subsetting and selection operators found in R. In particular it raises the important distinction of two simultaneously valuable but incompatible desiderata: simplification of results versus preservation of results. Continue reading R bracket is a bit irregular

Let’s try to motivate schemes

Recently there has been some controversy over David Mumford’s Nature magazine invited obituary of Alexander Grothendieck being initially rejected on submission (see here and here). At issue was the attempt to explain the mathematical idea of schemes (one of Alexander Grothendieck’s most important contributions) to a non-mathematician audience. Professor Mumford is a mathematician of great stature and his explanation is better than anything I could even attempt. However, in addition to the issues he raises I don’t think he was sensitive enough to what a non-mathematician considers motivation.

I’ll take a quick stab at explaining a very tiny bit of the motivation of schemes. I not sure the kind of chain of analogies argument I am attempting would work in an obituary (or in a short length), so I certainly don’t presume to advise professor Mumford on his obituary of a great mathematician (and person). Continue reading Let’s try to motivate schemes

A comment on preparing data for classifiers

I have been working through (with some honest appreciation) a recent article comparing many classifiers on many data sets: “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorim; 15(Oct):3133−3181, 2014 (which we will call “the DWN paper” in this note). This paper applies 179 popular classifiers to around 120 data sets (mostly from the UCI Machine Learning Repository). The work looks good and interesting, but we do have one quibble with the data-prep on 8 of the 123 shared data sets. Given the paper is already out (not just in pre-print) I think it is appropriate to comment publicly. Continue reading A comment on preparing data for classifiers