Posted on Categories Coding, Computer Science, data science, Mathematics, Statistics

## Added worked example to logistic regression project

We have added a worked example to the README of our experimental logistic regression code.

The Logistic codebase is designed to support experimentation on variations of logistic regression including:

What we mean by this code being “experimental” is that it has capabilities that many standard implementations do not. In fact most of the items in the above list are not usually made available to the logistic regression user. But our project is also stand-alone and not as well integrated into existing workflows as standard production systems. Before trying our code you may want to try R or Mahout. Continue reading Added worked example to logistic regression project

Posted on 3 Comments on Error Handling in R

## Error Handling in R

It’s often the case that I want to write an R script that loops over multiple datasets, or different subsets of a large dataset, running the same procedure over them: generating plots, or fitting a model, perhaps. I set the script running and turn to another task, only to come back later and find the loop has crashed partway through, on an unanticipated error. Here’s a toy example:

``` > inputs = list(1, 2, 4, -5, 'oops', 0, 10) > for(input in inputs) { + print(paste("log of", input, "=", log(input))) + } [1] "log of 1 = 0" [1] "log of 2 = 0.693147180559945" [1] "log of 4 = 1.38629436111989" [1] "log of -5 = NaN" Error in log(input) : Non-numeric argument to mathematical function In addition: Warning message: In log(input) : NaNs produced ```

The loop handled the negative arguments more or less gracefully (depending on how you feel about NaN), but crashed on the non-numeric argument, and didn’t finish the list of inputs.

How are we going to handle this?

Posted on Categories Expository Writing, Mathematics, Tutorials

## Rudie can’t fail (if majorized)

We have been writing for a while about the convergence of Newton steps applied to a logistic regression (See: What does a generalized linear model do?, How robust is logistic regression? and Newton-Raphson can compute an average). This is all based on our principle of working examples for understanding. This eventually progressed to some writing on the nature of problem solving (a nice complement to our earlier writing on calculation). In the course of research we were directed to a very powerful technique called the MM algorithm (see: “The MM Algorithm” Kenneth Lang, 2007; “A Tutorial on MM Algorithms”, David R. Hunter, Kenneth Lange, Amer. Statistician 58:30–37, 2004; and “Monotonicity of Quadratic-Approximation Algorithms”, Dankmar Bohning, Bruce G. Lindsay, Ann. Inst. Statist. Math, Vol. 40, No. 4, pp 641-664, 1988). The MM algorithm introduces an essential idea: majorized functions (not to be confused with the majorized order on R^d). Majorization it is an interesting way to modify Newton methods to be reliable contractions (and therefore converge in a manner similar to EM algorithms).

Here we will work an example of the MM method. We will not work it in its most general form, but in a form that quickly reveals much of the beauty of the method. We also introduce a “collared Newton step” which guarantees convergence without resorting to line-search (essentially resolving the issues in solving a logistic regression by Newton style methods). Continue reading Rudie can’t fail (if majorized)

Posted on 4 Comments on Level fit summaries can be tricky in R

## Level fit summaries can be tricky in R

Model level fit summaries can be tricky in R. A quick read of model fit summary data for factor levels can be misleading. We describe the issue and demonstrate techniques for dealing with them. Continue reading Level fit summaries can be tricky in R