“Comparing Apples and Oranges: Two Examples of the Limits of Statistical Inference, With an Application to Google Advertising Markets” is our analysis of Google AdSense Channel IDs and our use of the Cramer Rao bound to show that these IDs fundamentally limit what participants in the Google online advertising market can measure (and therefore in turn limit what these players can do).
Continue reading Google AdSense Channels IDs and the Cramer Rao Inequality
While executing some statistical detective work for a client we had a major “aha!” moment and realized something like “Amdahl’s Law” rephrased in terms of probability would solve everything. We finished our work using direct methods and moved on. But it is an interesting question: what is the probabilist’s (or gambler’s) equivalent of Amdahl’s Law? Continue reading What is the gambler’s equivalent of Amdahl’s Law?
New PDF slides version (presented at the Bay Area R Users Meetup October 13, 2009).
We at Win-Vector LLC appear to like R a bit more than some of our, perhaps wiser, colleagues ( see: Choose your weapon: Matlab, R or something else? and R and data ). While we do like R (see: Exciting Technique #1: The “R” language ) we also understand the need to defend oneself against the abuse regularly dished out by R. Here we will quickly share a few fighting techniques.
Continue reading Survive R
New paper: A Discrete Model Gauging Market Efficiency PDF
We highly recommend reading the PDF version, but please find below a HTML translation of the paper.
We follow up on some interesting work from the literature and explore some conditions that allow large predatory traders to dominate markets.
Continue reading A Discrete Model Gauging Market Efficiency
What makes a good graph? When faced with a slew of numeric data, graphical visualization can be a more efficient way of getting a feel for the data than going through the rows of a spreadsheet. But do we know if we are getting an accurate or useful picture? How do we pick an effective visualization that neither obscures important details, or drowns us in confusing clutter? In 1968, William Cleveland published a text called The Elements of Graphing Data, inspired by Strunk and White’s classic writing handbook The Elements of Style . The Elements of Graphing Data puts forward Cleveland’s philosophy about how to produce good, clear graphs — not only for presenting one’s experimental results to peers, but also for the purposes of data analysis and exploration. Cleveland’s approach is based on a theory of graphical perception: how well the human perceptual system accomplishes certain tasks involved in reading a graph. For a given data analysis task, the goal is to align the information being presented with the perceptual tasks the viewer accomplishes the best. Continue reading Good Graphs: Graphical Perception and Data Visualization
REPOST (now in HTML in addition to the original PDF).
This paper demonstrates and explains some of the basic techniques used in data mining. It also serves as an example of some of the kinds of analyses and projects Win Vector LLC engages in. Continue reading A Demonstration of Data Mining
On The Hysteria Over “The Cloud”
The frenzy of anticipation and opinion about “The Cloud” is so intense and so pointless it becomes “parody proof.”
Continue reading On The Hysteria Over “The Cloud”
Today’s question is: “should your mom use Google search?” It it is a good thing that Google has directly told us that their motto is “don’t be evil,” as their systems are subtle and difficult to evaluate.
Continue reading Should your mom use Google search?
Microsoft is once again going to try its hand at retail stores (for example see the following CNET article). From my experience I think this is going to be horrible. But it does not have to be- Microsoft (if it had the will) could produce a great store that is profitable and improves the world. Here is my quick history and wish list.
Continue reading Microsoft Store Again
A bit of a tempest in finance news involving accusations of sensitive code stolen from a major trading desk. For emerging details see:
Continue reading Thievery considered harmful