Posted on Categories math programming, Programming, StatisticsTags , , , , , , , ,

Large Data Logistic Regression (with example Hadoop code)

Living in the age of big data we ask what to do when we have the good fortune to be presented with a huge amount of supervised training data? Most often at large scale we are presented with the un-supervised problems of characterization and information extraction; but some problem domains offer an almost limitless supply of supervised training data (such as using older data to build models that predict the near future). Having too much training data is a good problem to have and there are ways to use traditional methods (like logistic regression) at this scale. We present an “out of core” logistic regression implementation and a quick example in Apache Hadoop running on Amazon Elastic MapReduce. This presentation assumes familiarity with Unix style command lines, Java and Hadoop.

Continue reading Large Data Logistic Regression (with example Hadoop code)

Posted on Categories Applications, Coding, Exciting Techniques, math programming, Mathematics, Programming, TutorialsTags , , , , , ,

Gradients via Reverse Accumulation

We extend the ideas of from Automatic Differentiation with Scala to include the reverse accumulation. Reverse accumulation is a non-obvious improvement to automatic differentiation that can in many cases vastly speed up calculations of gradients. Continue reading Gradients via Reverse Accumulation

Posted on Categories Applications, Coding, Computer Science, Exciting Techniques, Mathematics, Programming, TutorialsTags , , , , , , , 5 Comments on Automatic Differentiation with Scala

Automatic Differentiation with Scala

This article is a worked-out exercise in applying the Scala type system to solve a small scale optimization problem. For this article we supply complete Scala source code (under a GPLv3 license) and some design discussion. Continue reading Automatic Differentiation with Scala

Posted on Categories Computers, Opinion, Programming, TutorialsTags , , , , , , , , , , 2 Comments on Must Have Software

Must Have Software

Having worked with Unix (BSD, HPUX, IRIX, Linux and OSX), Windows (NT4, 2000, XP, Vista and 7) for quite a while I have seen a lot of different software tools. I would like to quickly exhibit my “must have” list. These are the packages that I find to be the single “must have offerings” in a number of categories. I have avoided some categories (such as editors, email programs, programing language, IDEs, photo editors, backup solutions, databases, database tools and web tools) where I have no feeling of having seen a single absolute best offering.

The spirit of the list is to pick items such that: if you disagree with an item in this list then either you are wrong or you know something I would really like to hear about.

Continue reading Must Have Software