I’ve been thinking a bit on statistical tests, their absence, abuse, and limits. I think much of the current “scientific replication crisis” stems from the fallacy that “failing to fail” is the same as success (in addition to the forces of bad luck, limited research budgets, statistical naiveté, sloppiness, pride, greed and other human qualities found even in researchers). Please read on for my current thinking. Continue reading The unfortunate one-sided logic of empirical hypothesis testing

# Tag: hypothesis testing

## Finding the K in K-means by Parametric Bootstrap

One of the trickier tasks in clustering is determining the appropriate number of clusters. Domain-specific knowledge is always best, when you have it, but there are a number of heuristics for getting at the likely number of clusters in your data. We cover a few of them in Chapter 8 (available as a free sample chapter) of our book *Practical Data Science with R*.

We also came upon another cool approach, in the `mixtools`

package for mixture model analysis. As with clustering, if you want to fit a mixture model (say, a mixture of gaussians) to your data, it helps to know how many components are in your mixture. The `boot.comp`

function estimates the number of components (let’s call it *k*) by incrementally testing the hypothesis that there are *k+1* components against the null hypothesis that there are *k* components, via parametric bootstrap.

You can use a similar idea to estimate the number of clusters in a clustering problem, if you make a few assumptions about the shape of the clusters. This approach is only heuristic, and more ad-hoc in the clustering situation than it is in mixture modeling. Still, it’s another approach to add to your toolkit, and estimating the number of clusters via a variety of different heuristics isn’t a bad idea.

Continue reading Finding the K in K-means by Parametric Bootstrap

## How to test XCOM “dice rolls” for fairness

XCOM: Enemy Unknown is a turn based video game where the player choses among actions (for example shooting an alien) that are labeled with a declared probability of success.

Image copyright Firaxis Games

A lot of gamers, after missing a 80% chance of success shot, start asking if the game’s pseudo random number generator is fair. Is the game really rolling the dice as stated, or is it cheating? Of course the matching question is: are player memories at all fair; would they remember the other 4 out of 5 times they made such a shot?

This article is intended as an introduction to the methods you would use to test such a question (be it in a video game, in science, or in a business application such as measuring advertisement conversion). There are already some interesting articles on collecting and analyzing XCOM data and finding and characterizing the actual pseudo random generator code in the game, and discussing the importance of repeatable pseudo-random results. But we want to add a discussion pointed a bit more at analysis technique in general. We emphasize methods that are efficient in their use of data. This is a statistical term meaning that a maximal amount of learning is gained from the data. In particular we do not recommend data binning as a first choice for analysis as it cuts down on sample size and thus is not the most efficient estimation technique.

## Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’

In this installment of our ongoing Statistics to English Translation series^{1}, we will look at the technical meaning of the term ”significant”. As you might expect, what it means in statistics is not exactly what it means in everyday language.

As always, a pdf version of this article is available as well. Continue reading Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’