Archive

Archive for the ‘math programming’ Category

Yet Another Java Linear Programming Library

November 23rd, 2012 Comments off

From time to time we work on projects that would benefit from a free lightweight pure Java linear programming library. That is a library unencumbered by a bad license, available cheaply, without an infinite amount of file format and interop cruft and available in Java (without binary blobs and JNI linkages). There are a few such libraries, but none have repeatably, efficiently and reliably met our needs. So we have re-packaged an older one of our own for release under the Apache 2.0 license. This code will have its own rough edges (not having been used widely in production), but I still feel fills an important gap. This article is brief introduction to our WVLPSolver Java library. Read more…

Importance Sampling

January 1st, 2012 Comments off

We describe briefly the powerful simulation technique known as “importance sampling.” Importance sampling is a technique that allows you to use numerical simulation to explore events that, at first look, appear too rare to be reliably approximated numerically. The correctness of importance sampling follows almost immediately from the definition of a change of density. Like most mathematical techniques, importance sampling brings in its own concerns and controls that were not obvious in the original problem. To deal with these concerns (like picking the re-weighting to use) we will largely appeal to the ideas from “A Tutorial on the Cross-Entropy Method” Pieter-Tjerk de Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein, Annals of Operations Research, 2005 vol. 134 (1) pp. 19-67. Read more…

An Appreciation of Locality Sensitive Hashing

November 21st, 2011 1 comment

We share our admiration for a set of results called “locality sensitive hashing” by demonstrating a greatly simplified example that exhibits the spirit of the techniques. Read more…

Large Data Logistic Regression (with example Hadoop code)

December 26th, 2010 Comments off

Living in the age of big data we ask what to do when we have the good fortune to be presented with a huge amount of supervised training data? Most often at large scale we are presented with the un-supervised problems of characterization and information extraction; but some problem domains offer an almost limitless supply of supervised training data (such as using older data to build models that predict the near future). Having too much training data is a good problem to have and there are ways to use traditional methods (like logistic regression) at this scale. We present an “out of core” logistic regression implementation and a quick example in Apache Hadoop running on Amazon Elastic MapReduce. This presentation assumes familiarity with Unix style command lines, Java and Hadoop. Read more…

Gradients via Reverse Accumulation

July 14th, 2010 Comments off

We extend the ideas of from Automatic Differentiation with Scala to include the reverse accumulation. Reverse accumulation is a non-obvious improvement to automatic differentiation that can in many cases vastly speed up calculations of gradients. Read more…