Posted on Categories Computer Science, Mathematics, StatisticsTags , , , , Leave a comment on Why No Exact Permutation Tests at Scale?

Why No Exact Permutation Tests at Scale?

Here at Win-Vector LLC we like permutation tests. Our team has written on them (for example: How Do You Know if Your Data Has Signal?) and they are used to estimate significances in our sigr and WVPlots R packages. For example permutation methods are used to estimate the significance reported in the following ROC plot.

NewImage

Permutation tests have their own literature and issues (examples: Permutation, Parametric and Bootstrap Tests of Hypotheses, Springer-Verlag, NY, 1994 (3rd edition, 2005), 2, 3, and 4).

In our R packages the permutation tests are estimated by a sampling procedure, and not computed exactly (or deterministically). It turns out this is likely a necessary concession; a complete exact permutation test procedure at scale would be big news. Please read on for my comments on this issue.

Continue reading Why No Exact Permutation Tests at Scale?

Posted on Categories data science, Pragmatic Data Science, Pragmatic Machine Learning, Programming, Statistics, TutorialsTags , , , , ,

Permutation Theory In Action

While working on a large client project using Sparklyr and multinomial regression we recently ran into a problem: Apache Spark chooses the order of multinomial regression outcome targets, whereas R users are used to choosing the order of the targets (please see here for some details). So to make things more like R users expect, we need a way to translate one order to another.

Providing good solutions to gaps like this is one of the thing Win-Vector LLC does both in our consulting and training practices.

Continue reading Permutation Theory In Action