We have added a worked example to the README of our experimental logistic regression code.
The Logistic codebase is designed to support experimentation on variations of logistic regression including:
- A pure Java implementation (thus directly usable in Java server environments).
- A simple multinomial implementation (that allows more than two possible result categories).
- The ability to work with too large for memory data-sets and directly from files or database tables.
- A demonstration of the steps needed to use standard Newton-Raphson in Hadoop.
- Ability to work with arbitrarily large categorical inputs.
- Provide explicit L2 model regularization.
- Implement safe optimization methods (like conjugate gradient, line-search and majorization) for situations where the standard Iteratively-re-Weighted-Least-Squares/Newton-Raphson fails.
- Provide an overall framework to quickly try implementation experiments (as opposed to novel usage experiments).
What we mean by this code being “experimental” is that it has capabilities that many standard implementations do not. In fact most of the items in the above list are not usually made available to the logistic regression user. But our project is also stand-alone and not as well integrated into existing workflows as standard production systems. Before trying our code you may want to try R or Mahout. Continue reading