Nassim Nicholas Taleb recently wrote an article advocating the abandonment of the use of standard deviation and advocating the use of mean absolute deviation. Mean absolute deviation is indeed an interesting and useful measure- but there is a reason that standard deviation is important *even if you do not like it*: it prefers models that get totals and averages correct. Absolute deviation measures do not prefer such models. So while MAD may be great for reporting, it can be a problem when used to optimize models. Read more…

I was watching my cousins play Unspeakable Words over Christmas break and got interested in the end game. The game starts out as a spell a word from cards and then bet some points game, but in the end (when you are down to one marker) it becomes a pure betting game. In this article we analyze an idealized form of the pure betting end game. Read more…

Elon Musk’s writing about a Tesla battery fire reminded me of some of the math related to trying to estimate the rate of a rare event from a single occurrence of the event (plus many non-event occurrences). In this article we work through some of the ideas. Read more…

This article is a break from data-science, and is instead about the kind of problem you can try on the train. It is inspired by the problems in Bollobas’s “The art of mathematics.”

One of the many irritating things about airlines is the fact that the cary-on bag restrictions are often stated as “your maximum combined linear measurement (length + width + height) must not exceed 45 inches” when they really mean your bag must fit into a 14 inch by 9 inch by 22 inch box (so they actually may not accept a 43 inch by one inch by one inch pool spear as your carry-on). The “total linear measure” seems (at first glance) “gameable,” but can (through some hairy math) at least be seen to at least be self-consistent. It turns out you can’t put a box with longer total linear measurements into a box with smaller total linear measurements.

Let’s work out why this could be problem and then why the measure works. Read more…

We share our opinion that `=`

should be preferred to the more standard `<-`

for assignment in R. This is from a draft of the appendix of our upcoming book. This has the risk of becoming an R version of Javascript’s semicolon controversy, but here you have it. Read more…

From time to time we work on projects that would benefit from a free lightweight pure Java linear programming library. That is a library unencumbered by a bad license, available cheaply, without an infinite amount of file format and interop cruft and available in Java (without binary blobs and JNI linkages). There are a few such libraries, but none have repeatably, efficiently and reliably met our needs. So we have re-packaged an older one of our own for release under the Apache 2.0 license. This code will have its own rough edges (not having been used widely in production), but I still feel fills an important gap. This article is brief introduction to our WVLPSolver Java library. Read more…

von Neumann and Morgenstern’s “Theory of Games and Economic Behavior” is the famous basis for game theory. One of the central accomplishments is the rigorous proof that comparative “preference methods” over fairly complicated “event spaces” are no more expressive than numeric (real number valued) utilities. That is: for a very wide class of event spaces and comparison functions “>” there is a utility function u() such that:

a > b (“>” representing the arbitrary comparison or preference for the event space) if and only if u(a) > u(b) (this time “>” representing the standard order on the reals).

However, an active reading of sections 1 through 3 and even the 2nd edition’s axiomatic appendix shows that the concept of “events” (what preferences and utilities are defined over) is deliberately left undefined. There is math and objects and spaces, but not all of them are explicitly defined in term of known structures (are they points in R^n, sets, multi-sets, sums over sets or what?). The word “event” is used early in the book and not in the index. Axiomatic treatments often rely on intentionally leaving ground-concepts undefined, but we are going to work a concrete example through von Neumann and Morgenstern to try and illustrate a bit more of the required intuition and deep nature of their formal notions of events and utility. I also will illustrate how, at least in discussion, von Neuman and Morgenstern may have held on to a naive “single outcome” intuition of events and a naive “direct dollars” intuition of utility despite erecting a theory carefully designed to support much more structure. This is possible because they never have to calculate in the general event space: they prove access to the preference allows them to construct the utility funciton u() and then work over the real numbers. Sections 1 through 3 are designed to eliminate the need for a theory of preference or utility and allow von Neuman and Morgenstern to work with real numbers (while achieving full generality). They never need to make the translations explicit, because soon after showing the translations are possible they assume they have already been applied. Read more…

We have added a worked example to the README of our experimental logistic regression code.

The Logistic codebase is designed to support experimentation on variations of logistic regression including:

What we mean by this code being “experimental” is that it has capabilities that many standard implementations do not. In fact most of the items in the above list are not usually made available to the logistic regression user. But our project is also stand-alone and not as well integrated into existing workflows as standard production systems. Before trying our code you may want to try R or Mahout. Read more…

Categories: Coding, Computer Science, data science, Mathematics, Statistics Tags: EC2, Elastic map reduce, experimental code, Hadoop, Java, Logistic Regression, Map Reduce
We have been writing for a while about the convergence of Newton steps applied to a logistic regression (See: What does a generalized linear model do?, How robust is logistic regression? and Newton-Raphson can compute an average). This is all based on our principle of working examples for understanding. This eventually progressed to some writing on the nature of problem solving (a nice complement to our earlier writing on calculation). In the course of research we were directed to a very powerful technique called the MM algorithm (see: “The MM Algorithm” Kenneth Lang, 2007; “A Tutorial on MM Algorithms”, David R. Hunter, Kenneth Lange, Amer. Statistician 58:30–37, 2004; and “Monotonicity of Quadratic-Approximation Algorithms”, Dankmar Bohning, Bruce G. Lindsay, Ann. Inst. Statist. Math, Vol. 40, No. 4, pp 641-664, 1988). The MM algorithm introduces an essential idea: majorized functions (not to be confused with the majorized order on R^d). Majorization it is an interesting way to modify Newton methods to be reliable contractions (and therefore converge in a manner similar to EM algorithms).

Here we will work an example of the MM method. We will not work it in its most general form, but in a form that quickly reveals much of the beauty of the method. We also introduce a “collared Newton step” which guarantees convergence without resorting to line-search (essentially resolving the issues in solving a logistic regression by Newton style methods). Read more…

A recent run of too many articles on the same topic (exhibits: A, B and C) puts me in a position where I feel the need to explain my motivation. Which itself becomes yet another article related to the original topic. The explanation I offer is: this is the way mathematicians think. To us mathematicians the tension is that there are far too many observable patterns in the world to be attributed to mere chance. So our dilemma is: for which patterns/regularities should we derive some underlying law and which ones are not worth worrying about. Or which conjectures should try to work all the way to proof or counter-example? Read more…