Page 94 of Gelman, Carlin, Stern, Dunson, Vehtari, Rubin “Bayesian Data Analysis” 3rd Edition (which we will call BDA3) provides a great example of what happens when common broad frequentist bias criticisms are over-applied to predictions from ordinary linear regression: the predictions appear to fall apart. BDA3 goes on to exhibit what might be considered the kind of automatic/mechanical fix responding to such criticisms would entail (producing a bias corrected predictor), and rightly shows these adjusted predictions are far worse than the original ordinary linear regression predictions. BDA3 makes a number of interesting points and is worth studying closely. We work their example in a bit more detail for emphasis. Continue reading Automatic bias correction doesn’t fix omitted variable bias
Two of the most common methods of statistical inference are frequentism and Bayesianism (see Bayesian and Frequentist Approaches: Ask the Right Question for some good discussion). In both cases we are attempting to perform reliable inference of unknown quantities from related observations. And in both cases inference is made possible by introducing and reasoning over well-behaved distributions of values.
As a first example, consider the problem of trying to estimate the speed of light from a series of experiments.
In this situation the frequentist method quietly does some heavy philosophical lifting before you even start work. Under the frequentist interpretation since the speed of light is thought to have a single value it does not make sense to model it as having a prior distribution of possible values over any non-trivial range. To get the ability to infer, frequentist philosophy considers the act of measurement repeatable and introduces very subtle concepts such as confidence intervals. The frequentist statement that a series of experiments places the speed of light in vacuum at 300,000,000 meters a second plus or minus 1,000,000 meters a second with 95% confidence does not mean there is a 95% chance that the actual speed of light is in the interval 299,000,000 to 301,000,000 (the common incorrect recollection of what a confidence interval is). It means if the procedure that generated the interval were repeated on new data, then 95% of the time the speed of light would be in the interval produced: which may not be the interval we are looking at right now. Frequentist procedures are typically easy on the practitioner (all of the heavy philosophic work has already been done) and result in simple procedures and calculations (through years of optimization of practice).
Bayesian procedures on the other hand are philosophically much simpler, but require much more from the user (production and acceptance of priors). The Bayesian philosophy is: given a generative model, a complete prior distribution (detailed probabilities of the unknown value posited before looking at the current experimental data) of the quantity to be estimated, and observations: then inference is just a matter of calculating the complete posterior distribution of the quantity to be estimated (by correct application of Bayes’ Law). Supply a bad model or bad prior beliefs on possible values of the speed of light and you get bad results (and it is your fault, not the methodology’s fault). The Bayesian method seems to ask more, but you have to remember it is trying to supply more (complete posterior distribution, versus subjunctive confidence intervals).
In this article we are going to work a simple (but important) problem where (for once) the Bayesian calculations are in fact easier than the frequentist ones. Continue reading Frequentist inference only seems easy
Some researchers (in both science and marketing) abuse a slavish view of p-values to try and falsely claim credibility. The incantation is: “we achieved p = x (with x ≤ 0.05) so you should trust our work.” This might be true if the published result had been performed as a single project (and not as the sole shared result in longer series of private experiments) and really points to the fact that even frequentist significance is a subjective and intensional quantity (an accusation usually reserved for Bayesian inference). In this article we will comment briefly on the negative effect of un-reported repeated experiments and what should be done to compensate. Continue reading Drowning in insignificance
It occurred to us recently that we don’t have any articles about Bayesian approaches to statistics here. I’m not going to get into the “Bayesian versus Frequentist” war; in my opinion, which style of approach to use is less about philosophy, and more about figuring out the best way to answer a question. Once you have the right question, then the right approach will naturally suggest itself to you. It could be a frequentist approach, it could be a bayesian one, it could be both — even while solving the same problem.
Let’s take the example that Bayesians love to hate: significance testing, especially in clinical trial style experiments. Clinical trial experiments are designed to answer questions of the form “Does treatment X have a discernible effect on condition Y, on average?” To be specific, let’s use the question “Does drugX reduce hypertension, on average?” Assuming that your experiment does show a positive effect, the statistical significance tests that you run should check for the sorts of problems that John discussed in our previous article, Worry about correctness and repeatability, not p-values: What are the chances that an ineffective drug could produce the results that I saw? How likely is it that another researcher could replicate my results with the same size trial?
We can argue about whether or not the question we are answering is the correct question — but given that it is the question, the procedure to answer it and to verify the statistical validity of the results is perfectly appropriate.
So what is the correct question? From your family doctor’s viewpoint, a clinical trial answers the question “If I prescribe drugX to all my hypertensive patients, will their blood pressure improve, on average?” That isn’t the question (hopefully) that your doctor actually asks, though possibly your insurance company does. Your doctor should be asking “If I prescribe drugX to this patient, the one sitting in my examination room, will the patient’s blood pressure improve?” There is only one patient, so there is no such thing as “on average.”
If your doctor has a masters degree in statistics, the question might be phrased as “If I prescribe drugX to this patient, what is the posterior probability that the patient’s blood pressure will improve?” And that’s a bayesian question. Continue reading Bayesian and Frequentist Approaches: Ask the Right Question