In our article What is a large enough random sample? we pointed out that if you wanted to measure a proportion to an accuracy “a” with chance of being wrong of “d” then a idea was to guarantee you had a sample size of at least:
This is the central question in designing opinion polls or running A/B tests. This estimate comes from a quick application of Hoeffding’s inequality and because it has a simple form it is possible to see that accuracy is very expensive (to halve the size of difference we are trying to measure we have to multiply the sample size by four) and the cheapness of confidence (increases in the required confidence or significance of a result cost only moderately in sample size).
However, for high-accuracy situations (when you are trying to measure two effects that are very close to each other) suggesting a sample size that is larger than is strictly necessary (as we are using an bound, not an exact formula for the required sample size). As a theorist or a statistician we like to error on the side of too large a sample (guaranteeing reliability), but somebody who is paying for each entry in a poll would want a smaller size.
We describe ergodic theory in modern notation accessible to interested computer scientists.
The ergodic theorem (http://en.wikipedia.org/wiki/Ergodic theory (link)) is an important principle of recurrence and averaging in dynamical systems. However, there are some inconsistent uses of the term, much of the machinery is intended to work with deterministic dynamical systems (not probabilistic systems, as is often implied) and often the conclusion of the theory is mis-described as its premises.
With the well deserved popularity of A/B testing computer scientists are finally becoming practicing statisticians. One part of experiment design that has always been particularly hard to teach is how to pick the size of your sample. The two points that are hard to communicate are that:
The required sample size is essentially independent of the total population size.
The required sample size depends strongly on the strength of the effect you are trying to measure.
These things are only hard to explain because the literature is overly technical (too many buzzwords and too many irrelevant concerns) and these misapprehensions can’t be relieved unless you spend some time addressing the legitimate underlying concerns they are standing in for. As usual explanation requires common ground (moving to shared assumptions) not mere technical bullying.