Posted on Categories Applications, Expository Writing, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, Statistics To English TranslationTags , , , ,

Statistics to English Translation, Part 2b: Calculating Significance

In the previous installment of the Statistics to English Translation, we discussed the technical meaning of the term ”significant”. In this installment, we look at how significance is calculated. This article will be a little more technically detailed than the last one, but our primary goal is still to help you decipher statements about significance in research papers: statements like “
$ (F(2, 864) = 6.6, p = 0.0014)$ ”.

As in the last article, we will concentrate on situations where we want to test the difference of means. You should read that previous article first, so you are familiar with the terminology that we use in this one.

A pdf version of this current article can be found here.
Continue reading Statistics to English Translation, Part 2b: Calculating Significance

Posted on Categories Rants, StatisticsTags , , 3 Comments on CRU graph yet again (with R)

CRU graph yet again (with R)

IowaHawk has a excellent article attempting to reproduce the infamous CRU climate graph using OpenOffice: Fables of the Reconstruction. We thought we would show how to produced similarly bad results using R.
Continue reading CRU graph yet again (with R)

Posted on Categories Applications, Expository Writing, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, Statistics To English TranslationTags , , , 4 Comments on Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’

Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’

In this installment of our ongoing Statistics to English Translation series1, we will look at the technical meaning of the term ”significant”. As you might expect, what it means in statistics is not exactly what it means in everyday language.

As always, a pdf version of this article is available as well. Continue reading Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’

Posted on Categories Coding, Statistics, TutorialsTags , 4 Comments on R examine objects tutorial

R examine objects tutorial

This article is quick concrete example of how to use the techniques from Survive R to lower the steepness of The R Project for Statistical Computing‘s learning curve (so an apology to all readers who are not interested in R). What follows is for people who already use R and want to achieve more control of the software. Continue reading R examine objects tutorial

Posted on Categories Computer Science, Exciting Techniques, Expository Writing, MathematicsTags , , , , ,

The Local to Global Principle

We describe the “the local to global principle.” It is a principle used to break algorithmic problem solving into two distinct phases (local criticism followed by global solution) and is an aid both in the design and in the application of algorithms. Instead of giving a formal definition of the principle we quickly define it and discuss a few examples and methods. We have produced both a stand-alone PDF (more legible) and a HTML/blog form (more skimable).
Continue reading The Local to Global Principle

Posted on Categories Applications, Expository Writing, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, Statistics To English TranslationTags , , , , , , 4 Comments on “I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures

“I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures

Scientists, engineers, and statisticians share similar concerns about evaluating the accuracy of their results, but they don’t always talk about it in the same language. This can lead to misunderstandings when reading across disciplines, and the problem is exacerbated when technical work is communicated to and by the popular media.

The “Statistics to English Translation” series is a new set of articles that we will be posting from time to time, as an attempt to bridge the language gaps. Our goal is to increase statistical literacy: we hope that you will find it easier to read and understand the statistical results in research papers, even if you can’t replicate the analyses. We also hope that you will be able to read popular media accounts of statistical and scientific results more critically, and to recognize common misunderstandings when they occur.

The first installment discusses some different accuracy measures that are commonly used in various research communities, and how they are related to each other. There is also a more legible PDF version of the article here.

Continue reading “I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures

Posted on Categories Administrativia, Expository Writing, MathematicsTags , , , , 2 Comments on Google AdSense Channels IDs and the Cramer Rao Inequality

Google AdSense Channels IDs and the Cramer Rao Inequality

“Comparing Apples and Oranges: Two Examples of the Limits of Statistical Inference, With an Application to Google Advertising Markets” is our analysis of Google AdSense Channel IDs and our use of the Cramer Rao bound to show that these IDs fundamentally limit what participants in the Google online advertising market can measure (and therefore in turn limit what these players can do).
Continue reading Google AdSense Channels IDs and the Cramer Rao Inequality

Posted on Categories Expository Writing, Quantitative Finance, StatisticsTags , , , , 2 Comments on What is the gambler’s equivalent of Amdahl’s Law?

What is the gambler’s equivalent of Amdahl’s Law?

While executing some statistical detective work for a client we had a major “aha!” moment and realized something like “Amdahl’s Law” rephrased in terms of probability would solve everything. We finished our work using direct methods and moved on. But it is an interesting question: what is the probabilist’s (or gambler’s) equivalent of Amdahl’s Law? Continue reading What is the gambler’s equivalent of Amdahl’s Law?

Posted on Categories Pragmatic Machine Learning, StatisticsTags 22 Comments on Survive R

Survive R

New PDF slides version (presented at the Bay Area R Users Meetup October 13, 2009).

We at Win-Vector LLC appear to like R a bit more than some of our, perhaps wiser, colleagues ( see: Choose your weapon: Matlab, R or something else? and R and data ). While we do like R (see: Exciting Technique #1: The “R” language ) we also understand the need to defend oneself against the abuse regularly dished out by R. Here we will quickly share a few fighting techniques.
Continue reading Survive R

Posted on Categories Finance, Mathematics, Quantitative FinanceTags , , , , 4 Comments on A Discrete Model Gauging Market Efficiency

A Discrete Model Gauging Market Efficiency

New paper: A Discrete Model Gauging Market Efficiency PDF

We highly recommend reading the PDF version, but please find below a HTML translation of the paper.

We follow up on some interesting work from the literature and explore some conditions that allow large predatory traders to dominate markets.

Continue reading A Discrete Model Gauging Market Efficiency