Scientists, engineers, and statisticians share similar concerns about evaluating the accuracy of their results, but they don’t always talk about it in the same language. This can lead to misunderstandings when reading across disciplines, and the problem is exacerbated when technical work is communicated to and by the popular media.
The “Statistics to English Translation” series is a new set of articles that we will be posting from time to time, as an attempt to bridge the language gaps. Our goal is to increase statistical literacy: we hope that you will find it easier to read and understand the statistical results in research papers, even if you can’t replicate the analyses. We also hope that you will be able to read popular media accounts of statistical and scientific results more critically, and to recognize common misunderstandings when they occur.
The first installment discusses some different accuracy measures that are commonly used in various research communities, and how they are related to each other. There is also a more legible PDF version of the article here.
Continue reading “I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures
While executing some statistical detective work for a client we had a major “aha!” moment and realized something like “Amdahl’s Law” rephrased in terms of probability would solve everything. We finished our work using direct methods and moved on. But it is an interesting question: what is the probabilist’s (or gambler’s) equivalent of Amdahl’s Law? Continue reading What is the gambler’s equivalent of Amdahl’s Law?
New PDF slides version (presented at the Bay Area R Users Meetup October 13, 2009).
We at Win-Vector LLC appear to like R a bit more than some of our, perhaps wiser, colleagues ( see: Choose your weapon: Matlab, R or something else? and R and data ). While we do like R (see: Exciting Technique #1: The “R” language ) we also understand the need to defend oneself against the abuse regularly dished out by R. Here we will quickly share a few fighting techniques.
Continue reading Survive R
What makes a good graph? When faced with a slew of numeric data, graphical visualization can be a more efficient way of getting a feel for the data than going through the rows of a spreadsheet. But do we know if we are getting an accurate or useful picture? How do we pick an effective visualization that neither obscures important details, or drowns us in confusing clutter? In 1968, William Cleveland published a text called The Elements of Graphing Data, inspired by Strunk and White’s classic writing handbook The Elements of Style . The Elements of Graphing Data puts forward Cleveland’s philosophy about how to produce good, clear graphs — not only for presenting one’s experimental results to peers, but also for the purposes of data analysis and exploration. Cleveland’s approach is based on a theory of graphical perception: how well the human perceptual system accomplishes certain tasks involved in reading a graph. For a given data analysis task, the goal is to align the information being presented with the perceptual tasks the viewer accomplishes the best. Continue reading Good Graphs: Graphical Perception and Data Visualization
REPOST (now in HTML in addition to the original PDF).
This paper demonstrates and explains some of the basic techniques used in data mining. It also serves as an example of some of the kinds of analyses and projects Win Vector LLC engages in. Continue reading A Demonstration of Data Mining
We explore some of the ideas from the seminal paper “The Data-Enrichment Method” ( Henry R Lewis, Operations Research (1957) vol. 5 (4) pp. 1-5). The paper explains a technique of improving the quality of statistical inference by increasing the effective size of the data-set. This is called “Data-Enrichment.”
Now more than ever we must be familiar with the consequences of these important techniques. Especially if we don’t know if we might already be a victim of them.
Continue reading The Data Enrichment Method
Our first “exciting technique” article is about a statistical language called “R.”
R is a language for statistical analysis available from http://cran.r-project.org/ . The things you can immediately do with it are incredible. You can import a spreadsheet and immediately spot relationships, trend and anomalies. R gives you instant access to top notch visualization methods and sophisticated statistical methods.
Continue reading Exciting Technique #1: The “R” language.
author: John Mount
I have finally written up and released a paper in PDF: Automatic Generation and Testing of Trades describing a lot of the statistics and optimization methods used when I was technical trading on a Banc of America Securities proprietary program trading desk. It was a very exciting time.
Continue reading Paper on stock trading
author: John Mount
Nina and I just finished up our analysis of some of the statistical difficulties encountered by users of Google AdSense. It came out a bit long- but we found the right statistical reference to prove that there are real barriers to understanding in this market. The paper is most legible in PDF, but we also include an HTML version so the blog entry can be skimmed.
Continue reading New Paper