<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Win-Vector Blog &#187; Quantitative Finance</title>
	<atom:link href="http://www.win-vector.com/blog/category/quantitative-finance/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.win-vector.com/blog</link>
	<description>The Applied Theorist&#039;s Point of View</description>
	<lastBuildDate>Sat, 04 Feb 2012 17:42:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Fast Portfolio re-Balancing as a Fractional Linear Program</title>
		<link>http://www.win-vector.com/blog/2010/08/fast-portfolio-re-balancing-as-a-fractional-linear-program/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=fast-portfolio-re-balancing-as-a-fractional-linear-program</link>
		<comments>http://www.win-vector.com/blog/2010/08/fast-portfolio-re-balancing-as-a-fractional-linear-program/#comments</comments>
		<pubDate>Fri, 13 Aug 2010 04:11:41 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Quantitative Finance]]></category>
		<category><![CDATA[Fractional Linear Program]]></category>
		<category><![CDATA[Linear Program]]></category>
		<category><![CDATA[Mathematical Finance]]></category>
		<category><![CDATA[Portfolio Theory]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1516</guid>
		<description><![CDATA[Fast Portfolio re-Balancing as a Fractional Linear Program is an example of the kind of work we have done encoding client problems (in this case optimal portfolio selection) as optimization problems (so we can use purchased software to solve them). Its a bit mathy- but we are excited we got permission to share this. An [...]
Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/' rel='bookmark' title='&#8220;Easy&#8221; Portfolio Allocation'>&#8220;Easy&#8221; Portfolio Allocation</a></li>
<li><a href='http://www.win-vector.com/blog/2008/05/betting-best-of-series/' rel='bookmark' title='Betting Best-Of Series'>Betting Best-Of Series</a></li>
<li><a href='http://www.win-vector.com/blog/2010/08/what-did-theorists-do-before-the-age-of-big-data/' rel='bookmark' title='What Did Theorists Do Before The Age Of Big Data?'>What Did Theorists Do Before The Age Of Big Data?</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.win-vector.com/dfiles/LPRisk.pdf" target='ext'>Fast Portfolio re-Balancing as a Fractional Linear Program</a> is an example of the kind of work we have done encoding client problems (in this case optimal portfolio selection) as optimization problems (so we can use purchased software to solve them).  Its a bit mathy- but we are excited we got permission to share this.<span id="more-1516"></span><br />
An example figure from the article:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/08/Vertices1.png" alt="Vertices.png" border="0" width="500" height="448" /><br />
</center></p>
<p>Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/' rel='bookmark' title='&#8220;Easy&#8221; Portfolio Allocation'>&#8220;Easy&#8221; Portfolio Allocation</a></li>
<li><a href='http://www.win-vector.com/blog/2008/05/betting-best-of-series/' rel='bookmark' title='Betting Best-Of Series'>Betting Best-Of Series</a></li>
<li><a href='http://www.win-vector.com/blog/2010/08/what-did-theorists-do-before-the-age-of-big-data/' rel='bookmark' title='What Did Theorists Do Before The Age Of Big Data?'>What Did Theorists Do Before The Age Of Big Data?</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/08/fast-portfolio-re-balancing-as-a-fractional-linear-program/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</title>
		<link>http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=what-is-the-gamblers-equivalent-of-amdahls-law</link>
		<comments>http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 20:38:21 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Expository Writing]]></category>
		<category><![CDATA[Quantitative Finance]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Amdahl's Law]]></category>
		<category><![CDATA[Kelly Criterion]]></category>
		<category><![CDATA[Kraft Inequality]]></category>
		<category><![CDATA[Mathematical Bedside Reading]]></category>
		<category><![CDATA[Statistical Detective]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=878</guid>
		<description><![CDATA[While executing some statistical detective work for a client we had a major &#8220;aha!&#8221; moment and realized something like &#8220;Amdahl&#8217;s Law&#8221; rephrased in terms of probability would solve everything. We finished our work using direct methods and moved on. But it is an interesting question: what is the probabilist&#8217;s (or gambler&#8217;s) equivalent of Amdahl&#8217;s Law? [...]
Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2009/08/good-graphs-graphical-perception-and-data-visualization/' rel='bookmark' title='Good Graphs: Graphical Perception and Data Visualization'>Good Graphs: Graphical Perception and Data Visualization</a></li>
<li><a href='http://www.win-vector.com/blog/2007/06/new-paper/' rel='bookmark' title='New Paper'>New Paper</a></li>
<li><a href='http://www.win-vector.com/blog/2009/04/the-data-enrichment-method/' rel='bookmark' title='The Data Enrichment Method'>The Data Enrichment Method</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>While executing some statistical detective work for a client we had a major &#8220;aha!&#8221; moment and realized  something like &#8220;Amdahl&#8217;s Law&#8221; rephrased in terms of probability would solve everything.  We finished our work using direct methods and moved on.  But it is an interesting question: what is the probabilist&#8217;s (or gambler&#8217;s) equivalent of Amdahl&#8217;s Law?<span id="more-878"></span></p>
<p>Amdahl&#8217;s Law is famous idea due to computer architect Gene Amdahl.  It is a simple technique that computer scientists use to re-direct their work back to important parts of problems.  Suppose you have a complicated system you wish to speed up.  Suppose this system is spending a p-fraction of its time in an important sub-process and that you have an idea that would speed up the sub-process by a factor of k.  Should you invest the effort?  </p>
<p>Amdahl&#8217;s Law says (by simple arithmetic): the speed-up (the ratio of the old run-time over the new run-time) the entire system would achieve if you implemented your improvement is not the factor of k you would hope for, but instead:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/10/eq11.png" alt="eq1.png" border="0" width="141" height="56" /><br />
</center></p>
<p>For example if p = 1/3 then you can only speed up the over all system by at most a factor of 33%, even your idea is so astoundingly good that you have k=1000.</p>
<p>Amdahl&#8217;s Law reminds us that speeding up a component you do not lose much time to is not an important accomplishment.  In fact Amdahl&#8217;s Law directly prescribes looking at your most expensive components as being the largest opportunities for improvement.  Appealing to Amdahl&#8217;s Law is an important nerd-tool to end &#8220;color of the bike shed&#8221; arguments (and concentrate only on the design of systems that actually have an impact on outcomes).</p>
<p>It is clear there are similar principles for managing expenses, revenue, effort and so on (such as the Pareto Principle).</p>
<p>But what is the equivalent statement in the harder and more complicated world of probabilities and gambling systems?  There are a lot of candidate statements and theorems (such as &#8220;look for horses not for zebras&#8221;, the Kraft Inequality, Kullback Leibler Distance, Cross Entropy and the Asymptotic Equipartition Principle) but I think the most powerful and direct analogue is: the Kelly Betting System.  The Kelly Betting System is a remarkable system that, like Amdahl&#8217;s Law, tells us exactly what to look at (and surprisingly some things to ignore).</p>
<p>Kelly&#8217;s original paper: &#8220;A New Interpretation of Information Rate&#8221; J. L. Jr Kelly, AT&#038;T Technical Journal (1956) phrases the problem as betting at a horse race.  The technique applies more generally (other forms of gambling, portfolio management, even explaining the preferences of lab-mice) but the clearest example remains a horse race.</p>
<p>We follow the excellent discussion of the problem from Cover and Thomas &#8220;Information Theory&#8221; Wiley (1991).    Consider a simplified horse race where there is only one payoff offered: picking the winning horse.  Suppose the (unknown) true probability of the i-th horse winning is p_i.  Further suppose the track publishes a set of payoffs for each horse such that if you bet a dollar on the i-th horse and it wins: you are given o_i dollars back.   </p>
<p>Now a gambler that has no estimate of the p_i might put all of their money on &#8220;the highest paying horse.&#8221;   That is picking the i such that o_i is maximal (&#8220;going for big score&#8221;).   A somewhat more informed gambler might put all of their money on the &#8220;horse with the best expected return&#8221; that is a horse i that maximizes p_i * o_i.  But this betting strategy &#8220;invites ruin&#8221;:  you have probability of 1 &#8211; p_i of losing all of your money.  Kelly starts with the controversial idea of trying to maximize expected log-return (instead of maximizing expected return).  Maximizing expected log-return avoids ruin, maximizes the exponential rate your wealth grows  and maximizes the median wealth over all outcomes (see: &#8220;The Kelly System Maximizes Median Fortune&#8221; S N Ethier, Journal of Applied Probability (2004) vol. 41 (4) pp. 1230-1236).  Even the observation that you don&#8217;t always want to put all of your money in a &#8220;favorable bet&#8221; (that is one with expectation p_i * o_i >1) is an important one.</p>
<p>To get the next part of Kelly&#8217;s system consider the sum of reciprocals of track offered payoffs:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/10/sum1.png" alt="sum1.png" border="0" width="82" height="68" /><br />
</center></p>
<p>At any real track this sum will be greater than 1 (i.e. the o_i will be small, making the sum large).   The larger the sum the more clearly unfair the track&#8217;s published payoff schedule is.  Let us assume we were at a fantastically generous track where this sum is exactly 1 (admittedly unrealistic, and both the paper and the book work beyond this limitation).  In this case we can write r_i = 1/o_i and we know r_i > 0 and the r_i sum to 1.  That is we can interpret the r_i as the track&#8217;s estimate of the probability of the i-th horse winning.   If o_i = 100 (the track is paying off 100:1) we then can infer they think the i-th horse has no more than a 1 in 100 chance of winning (else they could not afford to offer the bet).  Kelly&#8217;s system gives (and proves correct) the following remarkable advice: if the sum given above is 1 (i.e. the track is paying off at least a fair rate) then you can safely bet all of your money and you should bet a p_i fraction of your money on the i-th horse.  </p>
<p>That is: if you decide the track is paying off so much that it is worth your while to gamble then you should then completely ignore the track&#8217;s payoff schedule in making your bet.   You might use the track&#8217;s published payoffs as some of your evidence when trying to estimate the p_i (the probability of each horse winning), but once you have estimated these probabilities you then ignore the track&#8217;s payoff rates in designing your bets.  In fact your expected rate of winning is exactly proportional to how much closer to the true probabilities your estimate is than the track&#8217;s estimate is (Cover/Thomas example 6.1.1, so if unless you know something the track does not know you should not bet).  Also you should bet even on unlikely and underpaying horses to help cover the possibilities (this is because you are making a series of bets, not just a single bet- so each bet&#8217;s value is computed under the assumption that your other bets have failed).  This (provably correct) advice is contrary to many obvious and traditional betting systems.</p>
<p>The Kelly System is simultaneously very precise and broadly applicable.  For example: it has be extended to many other games and the stock market (see: &#8220;The Kelly Criterion and the Stock Market&#8221; Louis M Rotando, Edward O Thorp, The American Mathematical Monthly (1992) vol. 99 (10) pp. 922-931).  The Kelly System gives actionable advice (exact amounts to bet or exact amounts of effort to invest) and is very specific in saying what to look at.  </p>
<p>Just as Amdahl&#8217;s law shows us component speedup is a distraction the Kelly System shows us that published rates of return are siren songs.  Thus the Kelly System is the gambler&#8217;s equivalent of Amdahl&#8217;s Law.</p>
<p>Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2009/08/good-graphs-graphical-perception-and-data-visualization/' rel='bookmark' title='Good Graphs: Graphical Perception and Data Visualization'>Good Graphs: Graphical Perception and Data Visualization</a></li>
<li><a href='http://www.win-vector.com/blog/2007/06/new-paper/' rel='bookmark' title='New Paper'>New Paper</a></li>
<li><a href='http://www.win-vector.com/blog/2009/04/the-data-enrichment-method/' rel='bookmark' title='The Data Enrichment Method'>The Data Enrichment Method</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A Discrete Model Gauging Market Efficiency</title>
		<link>http://www.win-vector.com/blog/2009/09/a-discrete-model-gauging-market-efficiency/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=a-discrete-model-gauging-market-efficiency</link>
		<comments>http://www.win-vector.com/blog/2009/09/a-discrete-model-gauging-market-efficiency/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 05:34:23 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Finance]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Quantitative Finance]]></category>
		<category><![CDATA[Combinatorial Markets]]></category>
		<category><![CDATA[Discrete Markets]]></category>
		<category><![CDATA[Efficient Markets]]></category>
		<category><![CDATA[Information Taker]]></category>
		<category><![CDATA[Preditory Traders]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=809</guid>
		<description><![CDATA[New paper: A Discrete Model Gauging Market Efficiency PDF We highly recommend reading the PDF version, but please find below a HTML translation of the paper. We follow up on some interesting work from the literature and explore some conditions that allow large predatory traders to dominate markets. A Discrete Model Gauging Market Efficiency John [...]
Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2009/03/what-does-the-market-think/' rel='bookmark' title='What does the Market Think?'>What does the Market Think?</a></li>
<li><a href='http://www.win-vector.com/blog/2008/05/is-search-advertising-a-market-for-lemons/' rel='bookmark' title='Is Search Advertising a Market for Lemons?'>Is Search Advertising a Market for Lemons?</a></li>
<li><a href='http://www.win-vector.com/blog/2009/03/it-is-not-all-the-quants-fault/' rel='bookmark' title='It is not all the quants&#8217; fault.'>It is not all the quants&#8217; fault.</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>New paper: A Discrete Model Gauging Market Efficiency <a href="http://www.win-vector.com/dfiles/DiscreteModel.pdf">PDF</a> </p>
<p>We <em>highly</em> recommend reading the PDF version, but please find below a HTML translation of the paper.</p>
<p>We follow up on some interesting work from the literature and explore some conditions that allow large predatory traders to dominate markets.</p>
<p><span id="more-809"></span></p>
<h1 align="center">A Discrete Model Gauging Market Efficiency</h1>
<p align="center"><strong>John Mount<a name="tex2html3" href="#foot12" id="tex2html3"><sup>1</sup></a></strong></p>
<p></p>
<p align="center"><b>Date:</b> September 8, 2009</p>
<hr />
<h3>Abstract:</h3>
<div>We describe a discrete market model appropriate for quantifying certain desirable and un-desirable features of financial markets. This model allows direct exploration of the impact of different market structures on efficiency and fairness. We conclude by demonstrating that a single trader with a large budget can generate profit while making the market not profitable for smaller traders.</div>
<h1><a name="SECTION00010000000000000000" id="SECTION00010000000000000000">Introduction</a></h1>
<p>Stochastic calculus techniques[<a href="#citeulike:2080469">KS01</a>] (such as Brownian Motion, Levy Processes[<a href="#Applebaum:2004p1042">App04</a>], Wiener Processes or the Ito Calculus[<a href="#citeulike:2635904">Ste03b</a>,<a href="#Steele:2003p2288">Ste03a</a>]) are not the only abstraction useful in thinking about financial markets. Real markets do not meet the typical assumptions of the above systems (infinitely divisible time, no trade costs, no long-term memory and no large actors) and routinely fail goodness of fit tests against such models[<a href="#Lo:2001p1619">LM01</a>,<a href="#Lo:2005p2193">Lo05</a>]. In fact there is a simple arbitrage argument that markets would have summary statistics identical to Ito processes even if they are not such processes.[<a href="#Shafer:2004p1497">Sha04</a>] When studying which features make a market fair or efficient we can not rely on mathematical tools that assume and depend on fair and efficient markets.</p>
<p>To build the tools for our study we follow up on some of the ideas of Hasanhodzic, Lo and Viola [<a href="#Hasanhodzic:2009p2605">HLV09</a>] and propose a specific discrete market model (as distinguished from more traditional continuous mathematics as in [<a href="#MertonCTF">Mer99</a>]) that allows us to effectively apply ideas from game theory[<a href="#AlgGT">NNV07</a>] and theoretical computer science. We show how to solve for optimal trading strategies in this market model and conclude with an illustration of how a single trader can dominate a market by merely exercising a larger budget.</p>
<h1><a name="SECTION00020000000000000000" id="SECTION00020000000000000000">Outline</a></h1>
<p>We will proceed as follows:</p>
<ul>
<li>Define our market model</li>
<li>Solve for optimal trading strategies in our market model</li>
<li>Perform the experiment of adding a single large trader to our model</li>
<li>Draw conclusions</li>
<li>Suggest further research.</li>
</ul>
<h1><a name="SECTION00030000000000000000" id="SECTION00030000000000000000">The Market Model</a></h1>
<p>Our goal is to investigate if even perfect traders are vulnerable to an additional trader that has a larger budget. To do this we must have a market model where at least:</p>
<ul>
<li>We can solve for the optimal trading strategy</li>
<li>There is a reason to trade (profits are available).</li>
</ul>
<p>We propose such a market model below.</p>
<h2><a name="SECTION00031000000000000000" id="SECTION00031000000000000000">The Market</a></h2>
<p>To simplify the description of traders (and to minimize the amount of state we have to carry) we propose a market model that abstracts out price and many other features.</p>
<p>Our market model is represented as an ordered sequence of the symbols &#8220;<img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg3.png" alt="$ +$"/> &#8221;, &#8220;0 &#8221; and &#8220;<img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg4.png" alt="$ -$"/> &#8221;. A &#8220;<img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg3.png" alt="$ +$"/> &#8221; represents a recent price increase, a &#8220;0 &#8221; represents no change and a &#8220;<img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg4.png" alt="$ -$"/> &#8221; represents a recent price decrease. We are deliberately avoiding direct representation of real market quantities such as absolute price, volume, inventory, bid/ask books, margin and elasticity. Time is represented by regular &#8220;ticks&#8221; or the simple advance to the next symbol in the market sequence. We will describe how the next symbol in the market sequence is determined after we have described trades.</p>
<h3><a name="SECTION00031100000000000000" id="SECTION00031100000000000000">Type 1 Trades</a></h3>
<p>The first type of trade we allow in this market is a &#8220;round trip.&#8221; A round trip is one of the two following trades:</p>
<ul>
<li>&#8220;a long round trip&#8221;
<p>An immediate buy in the current time tick followed by an automatic (forced) sell on the next time tick. This trade is considered profitable if the next market symbol is a <img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg3.png" alt="$ +$"/> as the sell then happens at a higher price than the initial buy, yielding a profit.</p>
</li>
<li>&#8220;a short round trip&#8221;
<p>An immediate sell in the current time tick followed by an automatic (forced) buy on the next time tick. This trade is considered profitable if the next market symbol is a <img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg4.png" alt="$ -$"/> as the buy then happens at a lower price than the initial sell, yielding a profit.</p>
</li>
</ul>
<p>The forced nature of these round trip trades allow us to avoid modeling inventory and margin. Round trip trades are meant to abstract some of the aspects of high-frequency trading strategies.</p>
<h3><a name="SECTION00031200000000000000" id="SECTION00031200000000000000">Type 2 Trades</a></h3>
<p>The second type of trade we allow is a &#8220;simple buy&#8221; or &#8220;simple sell&#8221; on the next time tick. This type of trade is meant to abstract some of the properties of a trader who is not so close to the market and has market-external interests (like inventory, customers, margin, fundamental knowledge <img width="28" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg5.png" alt="$ \cdots$"/> ).</p>
<h3><a name="SECTION00031300000000000000" id="SECTION00031300000000000000">Market Evolution</a></h3>
<p>The market model evolves forward as follows. The second half of each type 1 trade (the sell in the long round trip and buy in the short round trip) is entered as a net impact on the upcoming time tick. So: a long round trip actually generates a sell or downward price impact on the next market tick (and a short round trip generates a buy or upward price impact on the next market tick). This &#8220;reverse impact&#8221; is in our model because we are not allowing these traders to hold inventory and in a &#8220;buy followed by a sell&#8221; pattern the initial buy impact is further in the past then the sell (so should have a lesser future impact). This is also similar to how in real markets a large net short position represents an upward influence on price as the market participants know the short position must eventually be covered.</p>
<p>Also each type 2 (or simple) trade is also entered directly as market impact. So: as expected simple buy trades generate upward price impact and simple sell generate downward price impact.</p>
<p>To determine the next market-symbol we sum the net impact entered against the next tick, if the net impact is positive the symbol is a <img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg3.png" alt="$ +$"/> , if it is zero the symbol is 0 and if it is negative the symbol is a <img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg4.png" alt="$ -$"/> . This differs both from the market model in [<a href="#Hasanhodzic:2009p2605">HLV09</a>] (where price is additive) and from real markets (where elasticity of price with respect to trades is very complicated).</p>
<p>For example: if three traders choose &#8220;long round trip&#8221; (betting the market will go up in the short term) and one trader chooses &#8220;simple buy&#8221; (betting the market will go up long term) then the net impact on the next tick is <!-- MATH<br />
 $(-1) + (-1) + (-1) + (+1) = -2$<br />
 --><br />
<img width="257" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg6.png" alt="$ (-1) + (-1) + (-1) + (+1) = -2$"/> and the next symbol is <img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg4.png" alt="$ -$"/> . The long round trip traders lose money and the simple buy trader is has an unrealized loss.<a name="tex2html4" href="#foot37" id="tex2html4"><sup>2</sup></a>Just as we settled on a standard unit for trade size we will use a standard unit for profit and arbitrarily say all traders with realized loss lost one unit per share.</p>
<p>This market model is deliberately simple, but just as symbolic dynamics offers insights to continuous dynamical systems [<a href="#symbdyn">TBS91</a>] this market model serves as a platform for analyzing aspects of real markets.</p>
<h2><a name="SECTION00032000000000000000" id="SECTION00032000000000000000">Type 1 Traders</a></h2>
<p>We have described a very simple and very limited market. We will now describe some of the traders. Our first set of traders we call &#8220;Type 1 Traders&#8221; and they are meant to represent high-frequency quantitative or technical traders. Type 1 traders perform only type 1 trades (long round trip or short round trip) or abstain from trading. For now we are restricting each type 1 trader to trade a single unit either in a long round trip, a short round trip, or to not trade.</p>
<p>We will model these traders as having no internal state and a limited window of memory of the market. We allow the traders to use probabilistic strategies (so they do not get caught always performing the exact same trade in a repeating situation). Under these limits we can write each trader as a simple table representing a map from <!-- MATH<br />
 $\{+,0,-\}^{k}$<br />
 --><br />
<img width="82" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg7.png" alt="$ \{+,0,-\}^{k}$"/> (the sequences of symbols length <img width="14" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg8.png" alt="$ k$"/> , i.e. what the trader is modeled as remembering) to pairs <!-- MATH<br />
 $(p_{\text{long}},p_{\text{short}})$<br />
 --><br />
<img width="100" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg9.png" alt="$ (p_{\text{long}},p_{\text{short}})$"/> where <!-- MATH<br />
 $p_{\text{long}}$<br />
 --><br />
<img width="39" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg10.png" alt="$ p_{\text{long}}$"/> is the trader&#8217;s chosen probability of making a long round trip in this situation and <!-- MATH<br />
 $p_{\text{short}}$<br />
 --><br />
<img width="44" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg11.png" alt="$ p_{\text{short}}$"/> is the trader&#8217;s chosen probability of making a short round in this situation.<a name="tex2html5" href="#foot162" id="tex2html5"><sup>3</sup></a></p>
<p>We place no limit on how much effort the Type 1 Traders make in pre-computing their strategy tables. One important point is: since the traders are allowed to use probabilistic tables we can assume (in the limit) that the optimal trading strategy is the same for all type 1 traders. This is because if a type 1 trader is losing money to other type 1 traders who are themselves making a profit then the original type 1 trader can &#8220;cannibalize their own business&#8221; by copying a bit of the strategy they are vulnerable to into their own strategy. For example if a trader is losing money to profitable short round trippers they can fix this by trading short round trips a bit more often.<a name="tex2html6" href="#foot153" id="tex2html6"><sup>4</sup></a> When we can use the assumption that all the type 1 traders have identical strategy tables we can then solve for this table and immediately demonstrate the characteristic of the market formed by these optimal traders.</p>
<p>The market model evolves as follows: if there are <img width="20" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg15.png" alt="$ m$"/> type 1 traders with a common memory window size of <img width="14" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg8.png" alt="$ k$"/> then the market symbol at time-<img width="11" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg16.png" alt="$ t$"/> is:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\text{market}(t) =<br />
\text{sign}\left(<br />
\sum_{i=1}^{m} \chi_i(\text{market}(t-1),\cdots,\text{market}(t-k))<br />
\right)<br />
\end{displaymath}<br />
 --></p>
<div align="center">&nbsp; &nbsp;market<img width="43" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg17.png" alt="$\displaystyle (t) =$"/>&nbsp; &nbsp;sign<img width="339" height="71" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg18.png" alt="$\displaystyle \left( \sum_{i=1}^{m} \chi_i(\text{market}(t-1),\cdots,\text{market}(t-k)) \right) $"/></div>
<p>where <img width="35" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg19.png" alt="$ \chi_i()$"/> is the random variable associated with the <img width="11" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg20.png" alt="$ i$"/> -th type 1 trader.</p>
<p>Already we can show: if the market is only populated by type 1 traders then the optimal trading strategy is to set <!-- MATH<br />
 $p_{\text{long}} =<br />
p_{\text{short}} = 0$<br />
 --><br />
<img width="134" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg21.png" alt="$ p_{\text{long}} = p_{\text{short}} = 0$"/> (to not trade) and there is in fact no market (no trades happen). This follows because for every time no more than half of the active type 1 traders can be on the profitable side, so at best the type 1 traders break even as a group and not trading is a dominant strategy.</p>
<p>To model another important aspect of markets (and to give the type 1 traders a reason to trade) we introduce type 2 traders.</p>
<h2><a name="SECTION00033000000000000000" id="SECTION00033000000000000000">Type 2 Traders</a></h2>
<p>Type 2 traders are completely oblivious to the market. Type 2 traders trade only type 2 trades (simple buy and simple sell). Type 2 traders trade, but do not look at or remember the market sequence. Oddly enough the type 2 traders abstract both the idea of completely informed traders (traders that know something about the future, so do not need to use the market past) and completely uniformed traders (traders trading due to some external to the market pressure like a need to recover liquid assets). For now we are restricting each type 2 trader to trade a single unit either in a simple buy or a simple sell.</p>
<p>We assume one family of type 2 traders that operate as follows: assume a simple sequence of &#8220;<img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg3.png" alt="$ +$"/> &#8221; and &#8220;<img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg4.png" alt="$ -$"/> &#8221; generated by the Markov Chain in Figure&nbsp;<a href="#fig:SimpleMarkovChain">1</a>. This Markov Chain emits a sequence of symbols where the same symbol follows the last with probability <img width="14" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg22.png" alt="$ p$"/> (and the symbol changes with probability <img width="44" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg23.png" alt="$ 1-p$"/> ).</p>
<div align="center"><a name="fig:SimpleMarkovChain" id="fig:SimpleMarkovChain"></a><a name="63"></a></p>
<table>
<caption align="bottom"><strong>Figure 1:</strong> Simple Markov Chain</caption>
<tr>
<td>
<div align="center"><img width="250" height="78" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/./SimpleChain.png" alt="Image SimpleChain"/></div>
</td>
</tr>
</table>
</div>
<p>We will call this sequence &#8220;the hidden symbol&#8221; as only our type 2 traders can see it (the type 1 traders can not). Each of our type 2 traders looks at the current hidden symbol and independently does the following: with probability <img width="13" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg24.png" alt="$ q$"/> they enter a simple buy or simple sell for the next time tick betting in the direction of the hidden symbol and with probability <img width="43" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg25.png" alt="$ 1-q$"/> they enter a simple buy or simple sell for the next time tick betting in the direction opposite to the hidden symbol. For now we will assume all type 2 traders share the same <img width="13" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg24.png" alt="$ q$"/> . The type 2 traders do not perform round trip trades, but instead hold inventory. Thus a type 2 trader&#8217;s long bet is modeled as adding a net upward impact to the next time period.</p>
<p>The market model now evolves as follows. If there are <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg26.png" alt="$ n$"/> type 2 traders then the market symbol at time-<img width="11" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg16.png" alt="$ t$"/> is:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\text{market}(t) =<br />
\text{sign}\left(<br />
\sum_{i=1}^{m} \chi_i(\text{market}(t-1),\cdots,\text{market}(t-k))<br />
+ \sum_{i=1}^{n} \Upsilon_i(\text{hidden}(t-1))<br />
\right)<br />
\end{displaymath}<br />
 --></p>
<div align="center">&nbsp; &nbsp;market<img width="43" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg17.png" alt="$\displaystyle (t) =$"/>&nbsp; &nbsp;sign<img width="522" height="71" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg27.png" alt="$\displaystyle \left( \sum_{i=1}^{m} \chi_i(\text{market}(t-1),\cdots,\text{market}(t-k)) + \sum_{i=1}^{n} \Upsilon_i(\text{hidden}(t-1)) \right) $"/></div>
<p>where <!-- MATH<br />
 $\Upsilon_i()$<br />
 --><br />
<img width="37" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg28.png" alt="$ \Upsilon_i()$"/> is the random variable associated with the <img width="11" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg20.png" alt="$ i$"/> -th type 2 trader.</p>
<p>As is often the case in mathematics what the abstract model means can change if we add different interpretations. If the hidden sequence that all of the type 2 traders simultaneously observe is thought to represent some important hidden value like the true value of the company underlying the equity being traded, then we consider the type 2 traders to be informed and consider their knowledge to be an advantage. If we consider the shared sequence to be irrelevant noise then we see these traders as some loose coalition whose value comes only from the fact their trades correlate with each other. If <img width="59" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg29.png" alt="$ q=0.5$"/> then we have truly uninformed (and uncorrelated) traders who are indeed doing nothing. Many real market properties that are attributed as being consequences of non-arbitrage are in fact consequences of conventions no more meaningful than the one given here (for example: closed end funds).</p>
<p>The interesting point is if <img width="14" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg22.png" alt="$ p$"/> is not too near <img width="27" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg30.png" alt="$ 0.5$"/> and <img width="13" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg24.png" alt="$ q$"/> is not too near <img width="27" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg30.png" alt="$ 0.5$"/> then the type 2 traders have a serial correlation (a correlation over time) that the type 1 traders can learn and exploit for profit. Or, from another point of view, the type 1 traders can profit by supplying liquidity to the type 2 traders.</p>
<h1><a name="SECTION00040000000000000000" id="SECTION00040000000000000000">Solving for the Optimal Strategy</a></h1>
<p>Our market was designed to allow a very succinct description. With only type 1 traders and one uniform family of type 2 traders our market is completely specified if we know:</p>
<ul>
<li><img width="20" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg15.png" alt="$ m$"/> : The number of type 1 traders in the market</li>
<li><img width="14" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg8.png" alt="$ k$"/> : The memory length of type 1 traders</li>
<li><img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg26.png" alt="$ n$"/> : The number of type 2 traders in the market</li>
<li><img width="14" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg22.png" alt="$ p$"/> : The symbol stability odds on the hidden sequence watched by type 2 traders</li>
<li><img width="13" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg24.png" alt="$ q$"/> : the faithfulness of type 2 traders in trading the hidden symbol.</li>
</ul>
<p>Given these parameters there is a unique shared optimal strategy for the type 1 traders, and we can efficiently solve for this strategy (without resorting to approximate or simulation results).</p>
<p>The entire state of the market at a given time can be written as a tuple <img width="77" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg31.png" alt="$ s = (x,y)$"/> where <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg32.png" alt="$ x$"/> is the sequence of the <img width="14" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg8.png" alt="$ k$"/> most recent result symbols from the market sequence (<img width="56" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg33.png" alt="$ +,0,-$"/> ) and <img width="14" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg34.png" alt="$ y$"/> is the most recent symbol from the hidden sequence (<img width="40" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg35.png" alt="$ +,-$"/> ). So there are only <img width="47" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg36.png" alt="$ 2 * 3^k$"/> possible states for the market. Any posited type 1 strategy (along with the above parameters) completely determines the transition odds between each of these detailed market states. Figure&nbsp;<a href="#fig:DetailedMarketMarkovChain">2</a> illustrates the states that make up a <img width="46" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg1.png" alt="$ k=1$"/> market.</p>
<div align="center"><a name="fig:DetailedMarketMarkovChain" id="fig:DetailedMarketMarkovChain"></a><a name="83"></a></p>
<table>
<caption align="bottom"><strong>Figure 2:</strong> Detailed Market Markov Chain for <img width="46" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg1.png" alt="$ k=1$"/></caption>
<tr>
<td>
<div align="center"><img width="500" height="274" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/./Market1.png" alt="Image Market1"/></div>
</td>
</tr>
</table>
</div>
<p>Once the transition odds are known between all states it is a simple matter of linear algebra to solve exactly for the stationary distribution and expected value of the market (for type 1 traders).[<a href="#finiteMC">KS76</a>] Global optimization techniques can be used to identify the optimal strategies and we can then characterize how these market models behave when populated with optimal traders.<a name="tex2html9" href="#foot156" id="tex2html9"><sup>5</sup></a></p>
<p>For concreteness we show a piece of the computation for the <!-- MATH<br />
 $m=1, k=1,<br />
n=2, p=0.8, q=0.9$<br />
 --><br />
<img width="275" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg37.png" alt="$ m=1, k=1, n=2, p=0.8, q=0.9$"/> market model. If the market&#8217;s last symbol was <img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg3.png" alt="$ +$"/> and the last hidden state was <img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg3.png" alt="$ +$"/> then the odds of moving from this state to this same detailed state (both a new hidden symbol of <img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg3.png" alt="$ +$"/> and a new market symbol of <img width="18" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg3.png" alt="$ +$"/> ) for the next time is given by:</p>
<div align="center"><img width="25" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg38.png" alt="$\displaystyle P($"/>hidden<img width="412" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg39.png" alt="$\displaystyle _{\text{next}} = + \vert \text{hidden} = +) P( \chi_1(+) + \Upsilon_1(+) + \Upsilon_2(+) &gt; 0 ) $"/></div>
<p>(where <img width="37" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg40.png" alt="$ \chi_1()$"/> is random variable representing the trade of the type 1 trader and <!-- MATH<br />
 $\Upsilon_1()$<br />
 --><br />
<img width="39" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg41.png" alt="$ \Upsilon_1()$"/> , <!-- MATH<br />
 $\Upsilon_2()$<br />
 --><br />
<img width="39" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg42.png" alt="$ \Upsilon_2()$"/> are the random variables representing the trades of the type 2 traders).</p>
<p>Using nothing more complicated than knowledge of the binomial distribution we can compute the complete transition matrix for the detailed Market Markov Chain. For example: assume our type 1 traders trade the most recent market symbol (except 0) with <img width="27" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg43.png" alt="$ 0.7$"/> probability (and makes no trade otherwise). Now label our states as:</p>
<div align="center">
<table cellpadding="3" border="1">
<tr>
<td align="center">Last Market Symbol</td>
<td align="center">Hidden Symbol</td>
<td align="center">State ID Number</td>
</tr>
<tr>
<td align="center">+</td>
<td align="center">+</td>
<td align="center">1</td>
</tr>
<tr>
<td align="center">+</td>
<td align="center">-</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">0</td>
<td align="center">+</td>
<td align="center">3</td>
</tr>
<tr>
<td align="center">0</td>
<td align="center">-</td>
<td align="center">4</td>
</tr>
<tr>
<td align="center">-</td>
<td align="center">+</td>
<td align="center">5</td>
</tr>
<tr>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">6</td>
</tr>
</table>
</div>
<p>then it is merely a matter of detailed arithmetic to derive the state to state transition probability matrix<a name="tex2html10" href="#foot97" id="tex2html10"><sup>6</sup></a>:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
P =<br />
\left(<br />
\begin{array}{llllll}<br />
0.648 &#038; 0.162 &#038; 0.648 &#038; 0.162 &#038; 0.7488 &#038; 0.1872 \\<br />
0.002 &#038; 0.008 &#038; 0.002 &#038; 0.008 &#038; 0.0272 &#038; 0.1088 \\<br />
0.0432 &#038; 0.0108 &#038; 0.144 &#038; 0.036 &#038; 0.0432 &#038; 0.0108 \\<br />
0.0108 &#038; 0.0432 &#038; 0.036 &#038; 0.144 &#038; 0.0108 &#038; 0.0432 \\<br />
0.1088 &#038; 0.0272 &#038; 0.008 &#038; 0.002 &#038; 0.008 &#038; 0.002 \\<br />
0.1872 &#038; 0.7488 &#038; 0.162 &#038; 0.648 &#038; 0.162 &#038; 0.648<br />
\end{array}<br />
\right)<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="441" height="147" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg46.png" alt="\begin{displaymath} P = \left( \begin{array}{llllll} 0.648 &amp; 0.162 &amp; 0.648 &amp; 0.... ... &amp; 0.7488 &amp; 0.162 &amp; 0.648 &amp; 0.162 &amp; 0.648 \end{array}\right) . \end{displaymath}"/></div>
<p>Solving for the stationary distribution is, as promised, quite easy. We want to find a vector <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg32.png" alt="$ x$"/> such that <!-- MATH<br />
 $(P-I) x = 0$<br />
 --><br />
<img width="104" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg47.png" alt="$ (P-I) x = 0$"/> and <!-- MATH<br />
 $1\cdot x = 1$<br />
 --><br />
<img width="68" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg48.png" alt="$ 1\cdot x = 1$"/> (<img width="14" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg49.png" alt="$ I$"/> denoting the identity matrix). Under very general conditions this will be a set of <img width="43" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg50.png" alt="$ s+1$"/> equations over <img width="13" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg51.png" alt="$ s$"/> variables with rank <img width="13" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg51.png" alt="$ s$"/> (so will have a unique solution and we don&#8217;t need to add any sign constraints).</p>
<p>This solution gives us the stationary odds of the market (how likely we are to see the market in any state at a random observation time):</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
x =<br />
\left(<br />
\begin{array}{l}<br />
0.420497 \\<br />
0.048611 \\<br />
0.030892 \\<br />
0.030892 \\<br />
0.048611 \\<br />
0.420497<br />
\end{array}<br />
\right)<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="151" height="147" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg52.png" alt="\begin{displaymath} x = \left( \begin{array}{l} 0.420497 \ 0.048611 \ 0.030892 \ 0.030892 \ 0.048611 \ 0.420497 \end{array}\right) . \end{displaymath}"/></div>
<p>Once we know this it is a matter of arithmetic to determine the expected value of the market for the type 1 trader.<a name="tex2html11" href="#foot104" id="tex2html11"><sup>7</sup></a> The trading strategy we imposed was not optimal but does have the positive value of <img width="36" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg53.png" alt="$ 0.13$"/> units expected profit per time tick. We can completely characterize these markets for moderate values of <img width="14" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg8.png" alt="$ k$"/> and arbitrary values of <img width="20" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg15.png" alt="$ m$"/> and <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg26.png" alt="$ n$"/> .</p>
<p>Already we can confirm some features we would expect to see in this model. For example the type 1 traders have a &#8220;tragedy of the commons&#8221; situation in that they are using up the correlations that the type 2 traders introduce. If there are too many technical traders trying to follow the type 2 traders then the market becomes anti-correlated and oscillates in a way that is not profitable for these traders (until they adjust their strategies). For example raising <img width="20" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg15.png" alt="$ m$"/> to <img width="14" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg54.png" alt="$ 2$"/> in our example makes the &#8220;follow the market 70%&#8221; of the time an unprofitable strategy that loses money at a rate of <img width="36" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg55.png" alt="$ 0.12$"/> units per time tick. However, with <img width="102" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg56.png" alt="$ m=2, n=3$"/> this same strategy is profitable at a rate of <img width="36" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg57.png" alt="$ 0.08$"/> units per time tick. The <img width="102" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg58.png" alt="$ m=2, n=2$"/> market can be made to be profitable if both of the technical traders act &#8220;superrationally&#8221;<a name="tex2html12" href="#foot158" id="tex2html12"><sup>8</sup></a> and lower their trade rate from following the market 70% of the time to something lower like 20% of the time. Figure&nbsp;<a href="#fig:stratValueK1N2M2">3</a> shows the expected value of the market <!-- MATH<br />
 $m=2, k=1, n=2, p=0.8, q=0.9$<br />
 --><br />
<img width="275" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg2.png" alt="$ m=2, k=1, n=2, p=0.8, q=0.9$"/> for the type 1 traders as the type 1 traders odds of &#8220;following the last symbol&#8221; are moved from 0 to <img width="14" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg59.png" alt="$ 1$"/> (and, as earlier, refrain from trading in all other cases).</p>
<div align="center"><a name="fig:stratValueK1N2M2" id="fig:stratValueK1N2M2"></a><a name="110"></a></p>
<table>
<caption align="bottom"><strong>Figure 3:</strong> Strategy values for <!-- MATH<br />
 $m=2, k=1, n=2, p=0.8, q=0.9$<br />
 --><br />
<img width="275" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg2.png" alt="$ m=2, k=1, n=2, p=0.8, q=0.9$"/></caption>
<tr>
<td>
<div align="center"><img width="400" height="400" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/./stratValueK1N2M2.png" alt="Image stratValueK1N2M2"/></div>
</td>
</tr>
</table>
</div>
<h1><a name="SECTION00050000000000000000" id="SECTION00050000000000000000">The Experiment</a></h1>
<p>Now that we have set up a market and described how to evaluate and solve for the optimal trading strategies we are ready to run an experiment. The experiment is the introduction of a large trader that trades at a much larger size than other type 1 traders. This large trader will act like a type 1 trader but it is allowed larger trade sizes and a small informational advantage over the other type 1 traders. This informational advantage is the ability to remember if their own last trade was one of three possible strategies (so it is not really extending the windows size, and this extension would not help the smaller type 1 traders against this strategy).</p>
<p>To illustrate we assume a market where <img width="52" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg60.png" alt="$ m=0$"/> , <img width="46" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg1.png" alt="$ k=1$"/> , <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg26.png" alt="$ n$"/> is large, <img width="63" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg61.png" alt="$ q&gt;1/2$"/> , <img width="63" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg62.png" alt="$ p &gt; 3/4$"/> and <!-- MATH<br />
 $n*(q-1/2)*(p-3/4)$<br />
 --><br />
<img width="187" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg63.png" alt="$ n*(q-1/2)*(p-3/4)$"/> is large (and all known to the large trader).</p>
<p>The large trader trades as follows: define three states to remember the large trader&#8217;s last state &#8220;odd time tick&#8221;, &#8220;even time tick following bluff&#8221; and &#8220;even time tick following non bluff.&#8221; We illustrate the large strategy in Figure&nbsp;<a href="#fig:Strat1">4</a>. On odd time ticks the large trader either bluffs (trades to flip the market symbol and takes a forced loss) or trades normally (allows the market to evolve under the influence of the type 2 traders and takes an expected profit). On even time ticks the large trader&#8217;s behavior depends if the last odd tick was a bluff (and the type 2 traders&#8217; influence on the market is masked) or the last odd tick was not a bluff (and the type 2 traders&#8217; influence on the market is visible). These two different states are marked in Figure&nbsp;<a href="#fig:Strat1">4</a> and the large trader abstains from trading after a bluff or trades for expected profit after a non-bluff.</p>
<div align="center"><a name="fig:Strat1" id="fig:Strat1"></a><a name="120"></a></p>
<table>
<caption align="bottom"><strong>Figure 4:</strong> Large Type 1 Trader States</caption>
<tr>
<td>
<div align="center"><img width="300" height="192" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/./Strat1.png" alt="Image Strat1"/></div>
</td>
</tr>
</table>
</div>
<p>The large trader&#8217;s strategy yields an augmented Markov chain that reflects the large trader&#8217;s state, the last symbol seen in the market and the last symbol of the hidden sequence. This Markov chain is shown in Figure&nbsp;<a href="#fig:BigStrat1">5</a> (with links from even time states to odd time states and links to and from unlikely states suppressed for clarity). We will describe the large trader&#8217;s strategy in detail below, but there are some simplifying points to keep in mind. Since <img width="85" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg64.png" alt="$ n (q-1/2)$"/> is large we are assuming that on odd time ticks and for even time ticks following non-bluffs the states where the market symbol and the hidden symbol disagree are very rare (and we will omit them from the analysis).</p>
<p>Stepping through the large trader strategy (see Figure&nbsp;<a href="#fig:BigStrat1">5</a>): on the odd time periods the large trader assumes that the market symbol equals the hidden symbol (i.e. the <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg26.png" alt="$ n$"/> type 2 traders successfully copied the hidden symbol to the market without interference). The large trader then flips a fair coin and with 50% chance &#8220;bluffs&#8221; (forcing the market to the symbol opposite the hidden symbol by trading a little more than <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg26.png" alt="$ n$"/> units in the appropriate direction) or on the other 50% of the time trades slightly less than <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg26.png" alt="$ n$"/> units to try and profit off the obvious tick to tick correlation in the market. On the even time ticks the large trader trades to profit if the previous trade was not a bluff or otherwise abstains from trading. The expected value of the sum of contributions of the type 2 traders is <!-- MATH<br />
 $\text{hidden\_symbol}*(q*n - (1-q)*n)$<br />
 --><br />
<img width="282" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg66.png" alt="$\text{hidden\_symbol}*(q*n - (1-q)*n)$"/> which has an absolute value of <!-- MATH<br />
 $(2 q - 1) n$<br />
 --><br />
<img width="76" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg67.png" alt="$ (2 q - 1) n$"/> . Let <!-- MATH<br />
 $Q = (2 q - 1) n$<br />
 --><br />
<img width="113" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg68.png" alt="$ Q = (2 q - 1) n$"/> . A bluff costs the large trader <img width="72" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg69.png" alt="$ Q + o(n)$"/> <a name="tex2html15" href="#foot159" id="tex2html15"><sup>9</sup></a> units as they enter a trade in large enough to overwhelm the type 2 traders with high probability. A trade for profit (either on a non-bluff odd time tick or a even time tick following a non-bluff) has a maximum expected value of <!-- MATH<br />
 $(Q -<br />
o(n)) * (p*(1) + (1-p)*(-1)) = Q ( 2 p - 1) - o(n)$<br />
 --><br />
<img width="441" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg72.png" alt="$ (Q - o(n)) * (p*(1) + (1-p)*(-1)) = Q ( 2 p - 1) - o(n)$"/> as the large trader must not overwhelm the expected effect of the type 2 traders. Every two time ticks the large trader either bluffs then abstains (with probability 1/2) or makes two profitable trade attempts in a row (with probability 1/2). So every 2 time ticks the expected return is:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
0.5 * (- Q - o(n) ) + 0.5 * 2 * (Q (2 p -1) - o(n))<br />
= (Q / 2) (p - 3/4) - o(n)<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="559" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg73.png" alt="$\displaystyle 0.5 * (- Q - o(n) ) + 0.5 * 2 * (Q (2 p -1) - o(n)) = (Q / 2) (p - 3/4) - o(n) . $"/></div>
<p>Or (q &#8211; 1/2)(p-3/4)n/2 &#8211; o(n) expected units return per time tick.</p>
<div align="center"><a name="fig:BigStrat1" id="fig:BigStrat1"></a><a name="131"></a></p>
<table>
<caption align="bottom"><strong>Figure 5:</strong> Example Strategy for Large Type 1 Trader</caption>
<tr>
<td>
<div align="center"><img width="500" height="540" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/./BigStrat1.png" alt="Image BigStrat1"/> <font size="-1">
<p />(for clarity transitions between unlikely states and from even time ticks to odd time ticks are not shown)</font></div>
</td>
</tr>
</table>
</div>
<p>This large trader strategy is for illustration, and is in no sense optimal<a name="tex2html17" href="#foot136" id="tex2html17"><sup>10</sup></a>. The important result is that when looking at the sequence of market symbols with a window of length 2 (the length of window that would be useful in defining a trading strategy for a <img width="46" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg1.png" alt="$ k=1$"/> type 1 opposing trader) all the zero free market symbol sequences of length 2 come up with the same probability: <img width="31" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg75.png" alt="$ 1/4$"/> . To a limited memory type 1 opponent (or one who has to encode their strategy with limited memory) the market looks like a fair coin with no serial correlation. Thus, if we start with <img width="52" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg60.png" alt="$ m=0$"/> (i.e. no other type 1 traders) a single large trader can take over the market and when we later increase <img width="20" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg15.png" alt="$ m$"/> the new type 1 traders will compute an optimal strategy of not trading (i.e. they will see no method to profitably enter the market).</p>
<p>The large trader has rendered the market untradable for other type 1 traders in the strongest possible sense. Because this market model is symmetric, has no trading costs and no margin requirements, no strategy can exist that forces other adapting strategies to lose money. This is because a strategy that is forced to lose money can be adapted into a profitable strategy by reversing the long and short actions. The large trader is using slightly more memory but this is just an accounting gimmick so they know on which ticks the market has information from the type 2 traders and on which ticks are noise from their own &#8220;bluff&#8221; or &#8220;Pyrrhic&#8221; trades. The other type 1 traders have no advantage when given the equivalent gimmick.<a name="tex2html18" href="#foot137" id="tex2html18"><sup>11</sup></a> Also, the large trader strategy is self financing: the large trader can hold the market (make the market look purely random to outsiders) while extracting a profit.</p>
<h1><a name="SECTION00060000000000000000" id="SECTION00060000000000000000">Conclusion</a></h1>
<p>We have described a combinatorial market model that is designed for simplicity. As is well known from mathematics and theoretical computer science even very simple systems can exhibit arbitrarily complex behavior when feedback, recursion or iteration are involved.</p>
<p>We have shown how to explicitly derive optimal trading behavior for small traders in this market model. We then demonstrated how a large trader (allowed to move more volume than the small traders) can &#8220;hold the market&#8221; in the sense they can make the market appear to be uncorrelated to outsiders while extracting a profit on their own. The ability to completely characterize our market model allows us to show that a self financing large trader is a stable solution in this market model even in the presence of optimal opponents with similar computational power.</p>
<p>It is beyond the scope of current techniques to show under which conditions a self-financing large trader could exist in a &#8220;fully realistic&#8221; market model. But by demonstration we have shown that we can not assume there are no self financing large traders.</p>
<h1><a name="SECTION00070000000000000000" id="SECTION00070000000000000000">Further Research</a></h1>
<p>Interesting follow up studies, which are well within the scope of the methods demonstrated here, include:</p>
<ul>
<li>Larger <img width="14" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg8.png" alt="$ k$"/> and heterogeneous <img width="14" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg8.png" alt="$ k$"/></li>
<li>A cross-market arbitrage interpretation for the type 2 traders</li>
<li>More detailed price and hidden symbol trajectories</li>
<li>Non-finite strategies (strategies indexed by integers instead of a small set of symbols)</li>
<li>Inventory and margin</li>
<li>Trade volume controlling price change (i.e. a model of price&#8217;s elasticity with respect to trade volume).</li>
</ul>
<h2><a name="SECTION00080000000000000000" id="SECTION00080000000000000000">Bibliography</a></h2>
<dl compact>
<dt><a name="Applebaum:2004p1042" id="Applebaum:2004p1042">App04</a></dt>
<dd>David Applebaum, <i>Levy processes- from probability to finance and quantum groups</i>, Notices of the AMS <b>51</b> (2004), no.&nbsp;1336-1347, 12.</dd>
<dt><a name="probmeth" id="probmeth">AS92</a></dt>
<dd>Nogal Alon and Joel&nbsp;H. Spencer, <i>The probabilistic method</i>, Wiley, 1992.</dd>
<dt><a name="Hasanhodzic:2009p2605" id="Hasanhodzic:2009p2605">HLV09</a></dt>
<dd>Jasmina Hasanhodzic, Andrew&nbsp;W Lo, and Emanuele Viola, <i>A computational view of market efficiency</i>, 1-14.</dd>
<dt><a name="metamag" id="metamag">Hof85</a></dt>
<dd>Douglas&nbsp;R. Hofstadter, <i>Metamagical themas: Questiong for the essence of mind and pattern</i>, Basic Books Inc., 1985.</dd>
<dt><a name="finiteMC" id="finiteMC">KS76</a></dt>
<dd>John&nbsp;G. Kemeny and J.&nbsp;Lauri Snell, <i>Finite markov chains</i>, Springer, 1976.</dd>
<dt><a name="citeulike:2080469" id="citeulike:2080469">KS01</a></dt>
<dd>Ioannis Karatzas and Steven&nbsp;E. Shreve, <i>Methods of mathematical finance</i>, Springer, September 2001.</dd>
<dt><a name="Lo:2001p1619" id="Lo:2001p1619">LM01</a></dt>
<dd>Andrew&nbsp;W Lo and A&nbsp;Craig MacKinlay, <i>A non-random walk down wall street</i>, Princeton University Press, 2001.</dd>
<dt><a name="Lo:2005p2193" id="Lo:2005p2193">Lo05</a></dt>
<dd>Andrew&nbsp;W Lo, <i>Reconciling efficient markets with behavioral finance: The adaptive markets hypothesis</i>, 44.</dd>
<dt><a name="MertonCTF" id="MertonCTF">Mer99</a></dt>
<dd>Robert&nbsp;C. Merton, <i>Continuous-time finance</i>, Blackwell, 1999.</dd>
<dt><a name="AlgGT" id="AlgGT">NNV07</a></dt>
<dd>Eva&nbsp;Tardos Noam&nbsp;Nisan, Tim&nbsp;Roughgarden and Vijay&nbsp;V. Vazirani, <i>Algorithmic game theory</i>, Cambridge, 2007.</dd>
<dt><a name="Rall:1996p2473" id="Rall:1996p2473">RC96</a></dt>
<dd>Louis&nbsp;B Rall and George&nbsp;F Corliss, <i>An introduction to automatic differentiation</i>, SIAM: Computational Differentiation: Techniques, Applications and Tools (1996), 1-18.</dd>
<dt><a name="Shafer:2004p1497" id="Shafer:2004p1497">Sha04</a></dt>
<dd>Glenn Shafer, <i>Why do price series look like ito processes?</i>, Rutgers (2004), 43.</dd>
<dt><a name="Steele:2003p2288" id="Steele:2003p2288">Ste03a</a></dt>
<dd>J&nbsp;Michael Steele, <i>Ito calculus</i>, Encyclopedia of Actuarial Sciences (2003), 1-12.</dd>
<dt><a name="citeulike:2635904" id="citeulike:2635904">Ste03b</a></dt>
<dd>J.&nbsp;Michael Steele, <i>Stochastic calculus and financial applications</i>, Springer, June 2003.</dd>
<dt><a name="symbdyn" id="symbdyn">TBS91</a></dt>
<dd>Michael&nbsp;Keane Tim&nbsp;Bedford and Caroline Series, <i>Egrodic theory, symbolic dynamics and hyperbolic spaces</i>, Oxford University Press, 1991.</dd>
</dl>
<p></p>
<hr />
<h4>Footnotes</h4>
<dl>
<dt><a name="foot12" id="foot12">&#8230; Mount</a><a href="#tex2html3"><sup>1</sup></a></dt>
<dd>email: <tt><a name="tex2html1" href="mailto:jmount@win-vector.com" id="tex2html1">mailto:jmount@win-vector.com</a></tt> company: <tt><a name="tex2html2" href="http://www.win-vector.com/" id="tex2html2">http://www.win-vector.com/</a></tt></dd>
<dt><a name="foot37" id="foot37">&#8230; loss.</a><a href="#tex2html4"><sup>2</sup></a></dt>
<dd>We do not enforce any sort of &#8220;conservation of money&#8221; (that the amount of profit earned by the short trader should equal the amount of money lost by the long traders). In the real market there is an aspect of conservation of money in trades, but there is not a conservation of money in a single time period if the traders have net holdings.</dd>
<dt><a name="foot162" id="foot162">&#8230; situation.</a><a href="#tex2html5"><sup>3</sup></a></dt>
<dd>So <!-- MATH<br />
 $p_{\text{long}}<br />
\ge 0$<br />
 --><br />
<img width="71" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg12.png" alt="$ p_{\text{long}} \ge 0$"/> , <!-- MATH<br />
 $p_{\text{short}} \ge 0$<br />
 --><br />
<img width="76" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg13.png" alt="$ p_{\text{short}} \ge 0$"/> and <!-- MATH<br />
 $p_{\text{long}} +<br />
p_{\text{short}} \le 1$<br />
 --><br />
<img width="132" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg14.png" alt="$ p_{\text{long}} + p_{\text{short}} \le 1$"/> .</dd>
<dt><a name="foot153" id="foot153">&#8230; often.</a><a href="#tex2html6"><sup>4</sup></a></dt>
<dd>This &#8220;traders can imitate each other&#8221; is a &#8220;linearity of expectation argument&#8221;[<a href="#probmeth">AS92</a>] and is a common argument technique in game theory.</dd>
<dt><a name="foot156" id="foot156">&#8230; traders.</a><a href="#tex2html9"><sup>5</sup></a></dt>
<dd>The optimization problem has some easy aspects. At the optimum we can assume all the type 1 traders are identical (so we solve for one trader of magnitude <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg26.png" alt="$ n$"/> instead of solving for a population of <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg26.png" alt="$ n$"/> traders) and we can use automatic differentiation techniques[<a href="#Rall:1996p2473">RC96</a>] to get gradients as we work.</dd>
<dt><a name="foot97" id="foot97">&#8230; matrix</a><a href="#tex2html10"><sup>6</sup></a></dt>
<dd>We are being a little non-standard here in that we are writing <img width="18" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg44.png" alt="$ P$"/> as an operator on the left, so if <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg32.png" alt="$ x$"/> is the state-vector of probabilities at a given time tick then <img width="28" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg45.png" alt="$ P x$"/> is the state-vector of probabilities at the next time tick. This is not the convention in the Markov Chain literature, but more compatible with other topics in linear algebra.</dd>
<dt><a name="foot104" id="foot104">&#8230; trader.</a><a href="#tex2html11"><sup>7</sup></a></dt>
<dd>Some care has to be taken that in computing the value of a strategy as we need access to some several additional transition matrices (each conditioned on knowing the proposed trade of the type 1 trader we are studying).</dd>
<dt><a name="foot158" id="foot158">&#8230; &#8220;superrationally&#8221;</a><a href="#tex2html12"><sup>8</sup></a></dt>
<dd>That is each type 2 trader must dial down their trading activity to account for the number of other type 2 traders present. Douglas Hofstadter called such behavior &#8220;superrational&#8221;[<a href="#metamag">Hof85</a>]. Traders with small budgets who can not collaborate are actually likely to do this- because while they are trading at too high a rate they lose money. However, a trader that can work at higher volume or tolerate larger losses can outwait the others and have the market for theirselves.</dd>
<dt><a name="foot159" id="foot159">&#8230;</a><a href="#tex2html15"><sup>9</sup></a></dt>
<dd>The <img width="37" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg70.png" alt="$ o(n)$"/> is an &#8220;order-of&#8221; notation meant to denote a quantity that increases more slowly than <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg26.png" alt="$ n$"/> as <img width="15" height="18" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg26.png" alt="$ n$"/> gets large. An example <img width="37" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg70.png" alt="$ o(n)$"/> quantity would be <img width="30" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/09/dmmimg71.png" alt="$ \sqrt{n}$"/> . This notation (when used properly) greatly speeds up calculation by suppressing irrelevant details.</dd>
<dt><a name="foot136" id="foot136">&#8230; optimal</a><a href="#tex2html17"><sup>10</sup></a></dt>
<dd>At the very least we could tune the bluff frequency and also trade (albeit with less certainty) in the after-bluff periods</dd>
<dt><a name="foot137" id="foot137">&#8230; gimmick.</a><a href="#tex2html18"><sup>11</sup></a></dt>
<dd>Unless they use the gimmick to collude to overcome the organized size of the large trader, but then the other type 2 traders are essentially also one large trader</dd>
</dl>
<p>Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2009/03/what-does-the-market-think/' rel='bookmark' title='What does the Market Think?'>What does the Market Think?</a></li>
<li><a href='http://www.win-vector.com/blog/2008/05/is-search-advertising-a-market-for-lemons/' rel='bookmark' title='Is Search Advertising a Market for Lemons?'>Is Search Advertising a Market for Lemons?</a></li>
<li><a href='http://www.win-vector.com/blog/2009/03/it-is-not-all-the-quants-fault/' rel='bookmark' title='It is not all the quants&#8217; fault.'>It is not all the quants&#8217; fault.</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2009/09/a-discrete-model-gauging-market-efficiency/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Betting Best-Of Series</title>
		<link>http://www.win-vector.com/blog/2008/05/betting-best-of-series/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=betting-best-of-series</link>
		<comments>http://www.win-vector.com/blog/2008/05/betting-best-of-series/#comments</comments>
		<pubDate>Wed, 28 May 2008 01:23:04 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Expository Writing]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Quantitative Finance]]></category>
		<category><![CDATA[Dynamic Programming]]></category>
		<category><![CDATA[Technical Papers]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=18</guid>
		<description><![CDATA[Betting Best of Series is a new expository paper describing the mathematics involved in betting on something like the United States&#8217; Major League Baseball World Series. It isn&#8217;t so much about baseball as about demonstrating some of the really great ideas from mathematical finance in a simplified setting. This sort analysis is the &#8220;secret sauce&#8221; [...]
Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2007/06/new-paper/' rel='bookmark' title='New Paper'>New Paper</a></li>
<li><a href='http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/' rel='bookmark' title='Paper on stock trading'>Paper on stock trading</a></li>
<li><a href='http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/' rel='bookmark' title='A Demonstration of Data Mining'>A Demonstration of Data Mining</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.win-vector.com/dfiles/BestOf.pdf">Betting Best of Series</a> is a new expository paper describing the mathematics involved in betting on something like the United States&#8217; Major League Baseball World Series.  It isn&#8217;t so much about baseball as about demonstrating some of the really great ideas from mathematical finance in a simplified setting.  This sort analysis is the &#8220;secret sauce&#8221; in a lot of financial models and I trying to share the thrilling feeling of working with these techniques in an elementary essay (with diagrams).<span id="more-18"></span></p>
<p>Also in (less legible) HTML:</p>
<h1 align="center">Betting Best-Of Series</h1>
<p align="center"><strong>John Mount<a name="tex2html1" href="#foot16" id="tex2html1"><sup>1</sup></a></strong></p>
<p></p>
<p align="center"><b>Date:</b> May 27, 2008</p>
<hr />
<h1><a name="SECTION00010000000000000000" id="SECTION00010000000000000000">Introduction</a></h1>
<p>We use the United States&#8217; Major League Baseball World Series to demonstrate some of the &#8220;arbitrage arguments&#8221;<a name="tex2html2" href="#foot21" id="tex2html2"><sup>2</sup></a>used in mathematical finance. This problem is a classic finance puzzle question and is an interesting introduction to some exciting techniques.</p>
<p>&#8220;Arbitrage&#8221; is the simultaneous buying and selling of a commodity, usually in multiple markets, that returns a risk-free profit. An example would be finding a market where apples are selling for $1 and another where they are selling for $2, and then simultaneously executing a purchase order in the cheap market and a sales order in the expensive market (assuming no significant shipping risks or costs). Typically &#8220;arbitrage opportunities&#8221; are too much to hope for and to make a profit you must add value, loan money, hold inventory or take on risk. This is just the mathematical finance way of saying &#8220;there is no free lunch,&#8221; but a number of surprising facts about markets can be proven using this principle.</p>
<h1><a name="SECTION00020000000000000000" id="SECTION00020000000000000000">The Problem</a></h1>
<div align="center"><a name="fig:wsgames" id="fig:wsgames"></a><a name="27"></a></p>
<table>
<caption align="bottom"><strong>Figure 1:</strong> World Series Tree (Win over Loss)</caption>
<tr>
<td>
<div align="center"><img width="400" height="510" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./WorldSeries1.png" alt="Image WorldSeries1"/></div>
</td>
</tr>
</table>
</div>
<p>Consider a &#8220;first to win <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> &#8221; contest like the United States&#8217; Major League Baseball World Series. The World Series is a &#8220;first to win four&#8221; contest (sometimes called &#8220;best of seven&#8221;) where a number of games are played between two teams and the first team to win four games is declared the series winner. Ignoring the possibility of ties this process can take from four to seven games. We can (as in Figure&nbsp;<a href="#fig:wsgames">1</a>) lay out all of the possibilities in to a picture that moves from left to right and then moves up when the first team wins and down when the second team wins.</p>
<p>Any sequence of games is represented by a path through this diagram (starting at the left) that reaches a node with no exit. At each node we have marked in the wins for each team (Team One on top, Team Two on the bottom). The nodes where one team has won four games are where the series ends.</p>
<p>The &#8220;arbitrage question&#8221; is:</p>
<blockquote><p>If you had access to a bookie who was willing to take an even-payoff bet (on either side) in each game of the World Series, can you design a schedule of bets on games that simulates an even-payoff one dollar bet on the outcome of the entire World Series?</p></blockquote>
<p>That is: you wish to make a bet that pays you $1 if your team wins the World Series and costs you $1 if your team is defeated. You can not find anybody to take such a bet- but you have found a bookie who makes the incredibly generous offer of taking bets (at even pay-off) on each and every game in the series. Can you, without any additional risk, simulate a World Series bet by making a series of per-game bets with this bookie?</p>
<h1><a name="SECTION00030000000000000000" id="SECTION00030000000000000000">The Answer</a></h1>
<p>The answer turns out to be that you can simulate a world-series bet. The reason for hope is that both types of bets (the even-payoff bets on games and an even-payoff bet on the whole series) are expressing the same underlying belief: that both teams have an exactly equal chance of winning. The teams may or may not have the equal chances of winning- but offering to take bets on both sides at equal pay-off is equivalent expressing just such a belief.</p>
<p>The principle that the probability you are willing to take bets at expresses your subjective probabilities is a principle goes back to Bruno de Finetti and is the most basic &#8220;arbitrage style&#8221; argument. The principle is simple but it is useful warm-up to think about. Under the assumption that you are &#8220;rational&#8221; (in the economic sense, which just means you are not giving money away without a reason) and if <img width="24" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg2.png" alt="$ p_S$"/> denotes your personal estimate of the probability of your team winning then if you are willing to bet $1 that your team wins at even payoff (meaning you collect $1 if your team wins pay $1 if your team loses) then for this bet to make economic sense you must have:</p>
<div align="center"><!-- MATH<br />
 \begin{equation*}<br />
p_S ( +\$1 ) + (1-p_S) (- \$1) \ge 0<br />
\end{equation*}<br />
 --></p>
<table cellpadding="0" width="100%" align="center">
<tr valign="middle">
<td nowrap align="center"><img width="244" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg3.png" alt="$\displaystyle p_S ( +\$1 ) + (1-p_S) (- \$1) \ge 0$"/></td>
<td nowrap width="10" align="right">&nbsp;&nbsp;&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/><br />
which means <!-- MATH<br />
 $p_S\ge \frac{1}{2}$<br />
 --><br />
<img width="60" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg4.png" alt="$ p_S\ge \frac{1}{2}$"/> .</p>
<p>Similarly if you are willing (for purely economic reasons) to take the other side of the bet at the same even-payoff bet on the other side (reversing the rolls of winning and losing) then it must be true that <!-- MATH<br />
 $p_S \le \frac{1}{2}$<br />
 --><br />
<img width="60" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg5.png" alt="$ p_S \le \frac{1}{2}$"/> . We then have our conclusion: from an economic point of view you should be willing to take either side of a fair-payoff bet only if your estimate of the probability of winning is <img width="32" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg6.png" alt="$ 1/2$"/> .</p>
<div align="center"><a name="fig:wspartial" id="fig:wspartial"></a><a name="44"></a></p>
<table>
<caption align="bottom"><strong>Figure 2:</strong> World Series With Some Values Filled In</caption>
<tr>
<td>
<div align="center"><img width="400" height="510" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./WorldSeries2.png" alt="Image WorldSeries2"/></div>
</td>
</tr>
</table>
</div>
<p>We now return to the World Series diagram. If we bet on individual games (instead of making one bet on the whole series) then at each node in the diagram we expect to have some sort of net winnings or net losses. For example at each node where our team has won four games we should be holding <img width="23" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg7.png" alt="$ \$1$"/> , so we will label these nodes with <img width="28" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg8.png" alt="$ +1$"/> . Similarly at each node where the opposing team has won for games we expect to have lost exactly <img width="23" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg7.png" alt="$ \$1$"/> so we label those nodes with <img width="29" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg9.png" alt="$ -1$"/> . Our task is to figure out the amount bet at each node and our net holdings at each node. If we can find a schedule of bet amounts that leads to the correct outcomes at the end of the world series and starts with an initial net holdings of $0 then we have solved the problem.</p>
<p>If we look at Figure&nbsp;<a href="#fig:wspartial">2</a> we see there the node corresponding to each team having won 3 games points to two nodes we know the values of (the World Series ending with either team the winner). We can use the fact that this node points only to nodes with known net holdings to figure out both the bet that must be made at this node and the net holdings this node should have at this point in World Series.</p>
<p>Let <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg10.png" alt="$ x$"/> be the (unknown) net holdings we have at this node and <img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg11.png" alt="$ y$"/> be the (unknown) amount we bet then to complete the World Series bet we must have the following:</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
x + y &#038; = &#038; 1 \\<br />
 x - y &#038; = &#038; -1<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="48" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg12.png" alt="$\displaystyle x + y$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg14.png" alt="$\displaystyle 1$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="48" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg15.png" alt="$\displaystyle x - y$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="29" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg16.png" alt="$\displaystyle -1$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>This is enough to notice that <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg10.png" alt="$ x$"/> (your holdings) must be the average of the two outcomes pointed to and <img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg11.png" alt="$ y$"/> (your bet) must be one half of the difference of the two outcomes. So the &#8220;each team has won three games&#8221; node (near the very right end of the diagram) should have a net holding of <img width="59" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg17.png" alt="$ x = \$0$"/> and we should bet <img width="58" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg18.png" alt="$ y = \$1$"/> . Filling in this node with the net holdings ($0) now means that there are other nodes that point only to nodes with filled-in net holdings. We can, in fact, repeat this process of filling in each node with unknown net holdings with the average of the two known nodes it points to until we complete the diagram as in Figure&nbsp;<a href="#fig:wsfull">3</a>.</p>
<div align="center"><a name="fig:wsfull" id="fig:wsfull"></a><a name="55"></a></p>
<table>
<caption align="bottom"><strong>Figure 3:</strong> World Series All Values Filled In</caption>
<tr>
<td>
<div align="center"><img width="400" height="510" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./WorldSeries3.png" alt="Image WorldSeries3"/></div>
</td>
</tr>
</table>
</div>
<p>In the completed figure each node is filled in with the net holdings required to implement our betting schedule. We can see that a diagram like this can always be filled out to completion by looking at the diagram as having layers like an onion and noticing that we start with the right most nodes filled in (they are the nodes where the world series ends). It is obvious that we can fill out every node in the layer of nodes just inside the outer layer if we start at the right most such node and work back. Every layer can be completed one after another until we get to the inner most layer which is just the starting node. To implement the betting strategy, we keep track of where we are in the diagram and always bet one half of the difference between the net holdings of the two nodes pointed to by the node we are at.</p>
<p>If the first node of the diagram was marked with a value other than zero it would mean that the world-series has a net bias for the first team or the second. Since the rules are symmetric this would be a nonsense conclusion, so we can be sure that all of the even-score nodes must be valued at zero.</p>
<p>The filling in of blanks using values ahead of them (from the future) is the heart of the Binomial Pricing Theory for options is based on a very deep idea called Dynamic Programming. The idea is that you may not know which future you will experience- but you may know the valuation of every possible future. It is an amazing fact that even without introducing probabilities or probability estimates of which future you will experience just knowing the value of every possible future is enough to compute the value of a bet in the present time. In our example: you may not know ahead of time the final scores of the world series, but you do know value of a world series bet for each possible ending score.</p>
<h1><a name="SECTION00040000000000000000" id="SECTION00040000000000000000">What is the analogy?</a></h1>
<p>From a finance or betting point of view the problem is solved- we have procedures for building the betting schedule and we have the schedule itself. From a mathematician&#8217;s point of view we have only just started- we have some procedures and relations but what are they an analogy of?</p>
<p>Naively one might think that they should bet around one fourth of their desired outcome in each game to simulate a best of four series. However to simulate a total World Series bet of <img width="23" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg7.png" alt="$ \$1$"/> we use an initial bet of <!-- MATH<br />
 $\$5/16 = \$0.3125$<br />
 --><br />
<img width="138" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg19.png" alt="$ \$5/16 = \$0.3125$"/> in our schedule. This is almost a third of our desired total bet. This gets us wondering: what is the general form of this first bet?</p>
<p>Let <!-- MATH<br />
 $\text{bet}(k)$<br />
 --><br />
bet<img width="29" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg20.png" alt="$ (k)$"/> denote the amount of the first bet in the simulation of a &#8220;best of <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> &#8221; bet. If we compute <!-- MATH<br />
 $\text{bet}(k)$<br />
 --><br />
bet<img width="29" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg20.png" alt="$ (k)$"/> (by constructing betting schedules as above) for many values of <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> we see that <!-- MATH<br />
 $\text{bet}(k)$<br />
 --><br />
bet<img width="29" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg20.png" alt="$ (k)$"/> seems to shrink slower than <img width="39" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg21.png" alt="$ 1/k.$"/> In fact it seems to shrink at a rate of around <!-- MATH<br />
 $1/\sqrt{k}$<br />
 --><br />
<img width="49" height="44" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg22.png" alt="$ 1/\sqrt{k}$"/> . Even more intriguing if you plot <!-- MATH<br />
 $k/(\text{bet}(k)*\text{bet}(k))$<br />
 --><br />
<img width="31" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg23.png" alt="$ k/($"/>bet<img width="39" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg24.png" alt="$ (k)*$"/>bet<img width="37" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg25.png" alt="$ (k))$"/> it converges (very slowly) to <!-- MATH<br />
 $3.14 \cdots$<br />
 --><br />
<img width="66" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg26.png" alt="$ 3.14 \cdots$"/> . We can conjecture that for very large <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> the initial bet is: <!-- MATH<br />
 $1/\sqrt{\pi k}$<br />
 --><br />
<img width="61" height="44" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg27.png" alt="$ 1/\sqrt{\pi k}$"/> where <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg28.png" alt="$ \pi$"/> is the famous ratio of the ratio of the length of the circumference of a circle to the the length of the diameter of the same circle.</p>
<p>Now <!-- MATH<br />
 $1/\sqrt{\pi k}$<br />
 --><br />
<img width="61" height="44" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg27.png" alt="$ 1/\sqrt{\pi k}$"/> is much larger that <img width="33" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg29.png" alt="$ 1/k$"/> (as <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> gets large). So the scheme says to bet a fairly large amount of your budget on the first game, and that winning the first bet is worth a bit more than you would expect (it takes you more than one <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> th of the way to victory).</p>
<div align="center"><a name="fig:wsWeightedPaths" id="fig:wsWeightedPaths"></a><a name="71"></a></p>
<table>
<caption align="bottom"><strong>Figure 4:</strong> Weighted Paths</caption>
<tr>
<td>
<div align="center"><img width="400" height="510" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./WorldSeries4.png" alt="Image WorldSeries4"/></div>
</td>
</tr>
</table>
</div>
<p>What is going on? We can again apply an arbitrage or de Finetti style argument and say since the whole game was &#8220;fair&#8221; with expected pay-off zero then we can relate probabilities and payoffs. The net holdings at each node encode how much of an advantage you have at the node (or how much you should pay to take over from another gambler at this point). If we let <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg30.png" alt="$ p_1$"/> denote the probability of going on to win the World Series bet after winning the first bet then we must have:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
p_1 (\$1) + (1-p_1) (-\$1) = \text{bet}(k) .<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="210" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg31.png" alt="$\displaystyle p_1 (\$1) + (1-p_1) (-\$1) =$"/>&nbsp; &nbsp;bet<img width="34" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg32.png" alt="$\displaystyle (k) . $"/></div>
<p>Or <!-- MATH<br />
 $p_1 = (\text{bet}(k) + 1)/2$<br />
 --><br />
<img width="54" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg33.png" alt="$ p_1 = ($"/>bet<img width="88" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg34.png" alt="$ (k) + 1)/2$"/> . For the real World Series we had <!-- MATH<br />
 $\text{bet}(4)=5/16$<br />
 --><br />
bet<img width="91" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg35.png" alt="$ (4)=5/16$"/> so <!-- MATH<br />
 $p_1 = 21/32$<br />
 --><br />
<img width="93" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg36.png" alt="$ p_1 = 21/32$"/> . This means we can read-off from the valuation tree that the probability of winning the World Series (for perfectly equally matched teams) rise from <img width="32" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg6.png" alt="$ 1/2$"/> to <img width="51" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg37.png" alt="$ 21/32$"/> after you win the first game.<a name="tex2html7" href="#foot77" id="tex2html7"><sup>3</sup></a> This can be confirmed from Figure&nbsp;<a href="#fig:wsfull">3</a>. It is easy to confirm that a <img width="51" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg37.png" alt="$ 21/32$"/> portion of all paths the node where Team One has one the first game end with Team One winning the whole World Series (each path must be weighted by its probability which are <!-- MATH<br />
 $2^{-path<br />
length}$<br />
 --><br />
<img width="90" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg38.png" alt="$ 2^{-path length}$"/> ). Instead of computing the bets we could have computed the probability of going on to win the World Series at each node<a name="tex2html8" href="#foot80" id="tex2html8"><sup>4</sup></a> (and then used the above equivalence principle to read off the required bets).</p>
<p>We can create a new diagram where we start at the node where our team has won the first game and we label all the non-ending nodes with the number of paths that reach the node. For example the two nodes immediately after start can be reach one way each and the next three nodes (&#8220;3 games to 0&#8221;, &#8220;2 games to 1&#8221; and &#8220;1 games to 2&#8221;) can be reached 1,2 and 1 ways respectively. It is a clever trick to notice that the easiest way to count the number of paths to a node is to just add the number of ways found on the previous nodes that point to the our target node. This clever way of counting paths is to use weighted paths (inspired by something called Pascal&#8217;s Triangle). Figure&nbsp;<a href="#fig:wsWeightedPaths">4</a> shows a few columns of a weighted path diagram (thought he ending nodes are re-written as the sum of the paths reaching them where every path is divided by <!-- MATH<br />
 $2^{-\text{path length}}$<br />
 --><br />
<img width="93" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg39.png" alt="$ 2^{-\text{path length}}$"/> which is the probability of following such a path).</p>
<p>The entries of weighted path diagram are identified by how many columns out from the start node they are and how many steps from one side of the row they are. Both identifiers start at zero so the starting node is denoted as <!-- MATH<br />
 ${0 \choose 0}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg40.png" alt="$ {0 \choose 0}$"/> the two nodes just after them are denoted <!-- MATH<br />
 ${1 \choose 0}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg41.png" alt="$ {1 \choose 0}$"/> and <!-- MATH<br />
 ${1 \choose 1}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg42.png" alt="$ {1 \choose 1}$"/> . The three nodes just after these are denoted <!-- MATH<br />
 ${2 \choose 0}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg43.png" alt="$ {2 \choose 0}$"/> , <!-- MATH<br />
 ${2 \choose 1}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg44.png" alt="$ {2 \choose 1}$"/> , <!-- MATH<br />
 ${2 \choose 2}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg45.png" alt="$ {2 \choose 2}$"/> and are (as we said before) equal to 1,2 and 1 respectively. These entries are called &#8220;binomial coefficients&#8221; and the rules for computing them (for integers <img width="31" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg46.png" alt="$ a,b$"/> ) are as follows:</p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
{a \choose b} &#038; = &#038; 0 \;\text{if $a&lt;0$\  or $b&lt;0$\  or $b>a$} \\<br />
{a \choose 0} &#038; = &#038; 1 \;\text{if $a>=0$} \\<br />
{a \choose a} &#038; = &#038; 1 \;\text{if $a>=0$} \\<br />
{a \choose b} &#038; = &#038; {a-1 \choose b-1} + {a-1 \choose b} \;\text{otherwise.}<br />
\end{eqnarray*}<br />
 &#8211;></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="42" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg47.png" alt="$\displaystyle {a \choose b}$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg48.png" alt="$\displaystyle 0 \;$"/>if <img width="49" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg49.png" alt="$ a&lt;0$"/> or <img width="47" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg50.png" alt="$ b&lt;0$"/> or <img width="47" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg51.png" alt="$ b&gt;a$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="42" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg52.png" alt="$\displaystyle {a \choose 0}$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg53.png" alt="$\displaystyle 1 \;$"/>if <img width="63" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg54.png" alt="$ a&gt;=0$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="42" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg55.png" alt="$\displaystyle {a \choose a}$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg53.png" alt="$\displaystyle 1 \;$"/>if <img width="63" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg54.png" alt="$ a&gt;=0$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="42" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg47.png" alt="$\displaystyle {a \choose b}$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="174" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg56.png" alt="$\displaystyle {a-1 \choose b-1} + {a-1 \choose b} \;$"/>otherwise.</td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>From our diagram we see that the probability of winning the World Series bet is a diagonal sum across Pascal&#8217;s Triangle (weighted by powers of 2). To somebody trained in combinatorics it is obvious<a name="tex2html9" href="#foot101" id="tex2html9"><sup>5</sup></a> that a sum like this must itself be a single binomial coefficient. A quick trip to &#8220;The On-Line Encyclopedia of Integer Sequences&#8221; is enough to identify the solution (Encyclopedia sequence &#8220;A001700&#8221;) and we can get an exact form for initial bet:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\text{bet}(k) =  { 2 k - 3 \choose k - 1} 2^{-(2 n - 3)} .<br />
\end{displaymath}<br />
 --></p>
<div align="center">&nbsp; &nbsp;bet<img width="204" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg57.png" alt="$\displaystyle (k) = { 2 k - 3 \choose k - 1} 2^{-(2 n - 3)} . $"/></div>
<p>A lot is known about Binomial coefficients. In fact by a formal called &#8220;Stirling&#8217;s approximation&#8221; we know</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
{ 2 k - 3 \choose k - 1} 2^{-(2 n - 3)} \approx \frac{1}{\sqrt{\pi k}}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="215" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg58.png" alt="$\displaystyle { 2 k - 3 \choose k - 1} 2^{-(2 n - 3)} \approx \frac{1}{\sqrt{\pi k}} $"/></div>
<p>as observed.</p>
<h1><a name="SECTION00050000000000000000" id="SECTION00050000000000000000">Relations</a></h1>
<p>de Finetti used this style of reasoning to provide a foundation for the basic theory of probability. Probability theory has always been somewhat problematic for mathematicians in that it has &#8220;content&#8221; or &#8220;an interpretation&#8221; whereas the power of modern mathematics comes from a more axiomatic or content-free way of thinking. The issue is if you are defining the meaning or interpretation of something like probability how do you check or demonstrate that you have the correct meaning without referring to some other pre-existing interpretation? A foundational or first interpretation has trouble looking for prior definitions to show equivalence to.[<a href="#Shafer:2002p1513">6</a>]</p>
<p>The arbitrage-free arguments and the binomial arguments in particular are the basis of much of mathematical finance and are the basis for a number of Nobel Prizes in Economics including the Black-Scholes-Merton Option Pricing Model[<a href="#Black:1973p1502">2</a>] and the Binomial Option Pricing Model.[<a href="#Cox:1979p1505">5</a>]</p>
<p><img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg28.png" alt="$ \pi$"/> (the ratio of the circumference of a circle to its diameter) is one of the most famous constants in mathematics. Pascal&#8217;s Triangle is one of the oldest and most studied diagrams in mathematics with roots all the way back into ancient China.[<a href="#OstermanCoulter:2003p1034">4</a>] It is actually remarkable how much Zhu Shijie 1303 diagram: <img width="200" height="312" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Yanghui_triangle.png" alt="Image Yanghui_triangle"/> looks like our modern version of Pascal&#8217;s Triangle (though they are separated by about 350 years, source Wikipedia): <img width="200" height="102" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Triangle.png" alt="Image Triangle"/>. The two diagram differ only in the notation used to write numbers and both start by filling in two diagonals of ones and all other numbers are the sums of the two numbers nearest and above them.</p>
<p>The arguments that replace paths with counts are a particular example of a technique called &#8220;Dynamic Programming&#8221; invented by Richard Bellman for mathematical optimization and now one of the core concepts of algorithm design.[<a href="#dynamicProgramming">1</a>]</p>
<p>The idea of using a set of unknown futures that each have a known value is the key idea in solving a number of hard problems in probability and in optimization in the face of uncertainty. One of the the most famous of these problems is the &#8220;two armed bandit&#8221; where one must decide how to split ones bets between two slot machines that are thought to pay-off at different rates.[<a href="#Chernoff:1959p1444">3</a>]</p>
<p>For the two armed bandit problem the concern is how long to experiment with both machines when one machine seems to be paying more. The correct solution depends on seeing that how certain you need to be on the difference in machine vales (which in turn drives how long you experiment on both machines). This is a function of how long you intend to use the information. If you intend to play for a long time you want a long initial research phase to produce a very high confidence ranking of the machines; if you do not intend to play for long you want to switch to the machine you suspect is better sooner and on less evidence. Of course &#8220;slot machines&#8221; is just a toy-problem standing in for uncertain investments, research spending or even spending on different only advertising phrases.</p>
<h1><a name="SECTION00060000000000000000" id="SECTION00060000000000000000">Conclusions</a></h1>
<p>The finance &#8220;no arbitrage&#8221; principle is actually a very powerful mathematical tool. It is equivalent to but somewhat more graceful than introducing probabilities when solving some combinatorial problems. In this setting it is equivalent to de Finetti&#8217;s principle and converting between probabilities and net holdings is very easy.</p>
<h2><a name="SECTION00070000000000000000" id="SECTION00070000000000000000">Bibliography</a></h2>
<dl compact>
<dt><a name="dynamicProgramming" id="dynamicProgramming">1</a></dt>
<dd>B<small>ELLMAN,</small> R.<br />
<em>Dynamic Programming</em>.<br />
Dover Publications, 2003.</dd>
<dt><a name="Black:1973p1502" id="Black:1973p1502">2</a></dt>
<dd>B<small>LACK,</small> F., <small>AND</small> S<small>CHOLES,</small> M.<br />
The pricing of options and corporate liabilities.<br />
<em>The Journal of Political Economy 81</em>, 3 (Jun 1973), 637-654.</dd>
<dt><a name="Chernoff:1959p1444" id="Chernoff:1959p1444">3</a></dt>
<dd>C<small>HERNOFF,</small> H.<br />
Sequential design of experiments.<br />
<em>Ann. Math. Statist. 30</em>, 3 (Feb 1959), 755-770.</dd>
<dt><a name="OstermanCoulter:2003p1034" id="OstermanCoulter:2003p1034">4</a></dt>
<dd>C<small>OULTER,</small> L.&nbsp;O.<br />
What is mathematics? toward a global view.<br />
17.</dd>
<dt><a name="Cox:1979p1505" id="Cox:1979p1505">5</a></dt>
<dd>C<small>OX,</small> J.&nbsp;C., R<small>OSS,</small> S.&nbsp;A., <small>AND</small> R<small>UBINSTEIN,</small> M.<br />
Option pricing: A simplified approach.<br />
<em>Journal of Financial Economics</em> (Sep 1979), 39.</dd>
<dt><a name="Shafer:2002p1513" id="Shafer:2002p1513">6</a></dt>
<dd>S<small>HAFER,</small> G., G<small>ILLETT,</small> P.&nbsp;R., <small>AND</small> S<small>CHERL,</small> R.&nbsp;B.<br />
A new understanding of subjective probability and its generalization to lower and upper prevision.<br />
<em>Game-Theoretic Probability Project</em> (Oct 2002), 62.</dd>
</dl>
<p></p>
<hr />
<h4>Footnotes</h4>
<dl>
<dt><a name="foot16" id="foot16">&#8230; Mount</a><a href="#tex2html1"><sup>1</sup></a></dt>
<dd>http://www.win-vector.com/</dd>
<dt><a name="foot21" id="foot21">&#8230; arguments&#8221;</a><a href="#tex2html2"><sup>2</sup></a></dt>
<dd>More pedantically we are using the principle of &#8220;no arbitrage&#8221; or &#8220;arbitrage free&#8221; argument, but the name is traditional.</dd>
<dt><a name="foot77" id="foot77">&#8230; game.</a><a href="#tex2html7"><sup>3</sup></a></dt>
<dd>Again, this if for the unrealistic situation of perfectly matched teams. For teams that have uneven probability the series strongly amplifies the better team&#8217;s chance of winning (which is one of the series intents). Also a better could update his subjective probability based on the first outcome which also changes things.</dd>
<dt><a name="foot80" id="foot80">&#8230; node</a><a href="#tex2html8"><sup>4</sup></a></dt>
<dd>This calculation is in essence summing end outcomes across all possible paths weighted by how likely each path is. There are many possible paths, but the calculation can be performed quite efficiently.</dd>
<dt><a name="foot101" id="foot101">&#8230; obvious</a><a href="#tex2html9"><sup>5</sup></a></dt>
<dd>&#8220;Obvious&#8221; is actually a special term in mathematics. To illustrate what it means we repeat a story. A mathematician was giving a lecture and stated that the point just shown was obvious. A student asked if it was really obvious. The mathematician stopped the lecture and paused to think. The mathematician thought some more, and eventually walked out of the room. Forty minutes later the mathematician returned to the lecture hall and informed the student that the last point was indeed obvious.</dd>
</dl>
<p>Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2007/06/new-paper/' rel='bookmark' title='New Paper'>New Paper</a></li>
<li><a href='http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/' rel='bookmark' title='Paper on stock trading'>Paper on stock trading</a></li>
<li><a href='http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/' rel='bookmark' title='A Demonstration of Data Mining'>A Demonstration of Data Mining</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2008/05/betting-best-of-series/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper on stock trading</title>
		<link>http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=paper-on-stock-trading</link>
		<comments>http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/#comments</comments>
		<pubDate>Thu, 04 Oct 2007 02:03:33 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Quantitative Finance]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Dynamic Programming]]></category>
		<category><![CDATA[Stock Trading]]></category>
		<category><![CDATA[Technical Papers]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/2007/10/03/paper-on-stock-trading/</guid>
		<description><![CDATA[author: John Mount I have finally written up and released a paper in PDF: Automatic Generation and Testing of Trades describing a lot of the statistics and optimization methods used when I was technical trading on a Banc of America Securities proprietary program trading desk.  It was a very exciting time. I have also included [...]
Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2007/06/new-paper/' rel='bookmark' title='New Paper'>New Paper</a></li>
<li><a href='http://www.win-vector.com/blog/2008/05/betting-best-of-series/' rel='bookmark' title='Betting Best-Of Series'>Betting Best-Of Series</a></li>
<li><a href='http://www.win-vector.com/blog/2009/03/what-does-the-market-think/' rel='bookmark' title='What does the Market Think?'>What does the Market Think?</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>author: John Mount</p>
<p>I have finally written up and released a paper in  PDF: <a href="http://www.win-vector.com/SelectedPapers/files/AutomaticGenerationAndTestingOfTrades.pdf">Automatic Generation and Testing of Trades</a> describing a lot of the statistics and optimization methods used when I was technical trading on a Banc of America Securities proprietary program trading desk.  It was a very exciting time. </p>
<p><span id="more-5"></span><br />
I have also included a less legible HTML version:</p>
<h1 align="center">Automatic Generation and Testing of <em>Un-Rolls</em> for Profitable Technical Trades</h1>
<p align="center"><strong>John Mount<a name="tex2html1" href="#foot10" id="tex2html1"><sup>1</sup></a></strong></p>
<p></p>
<p align="center"><b>Date:</b> September 9, 2007</p>
<hr />
<h1><a name="SECTION00010000000000000000" id="SECTION00010000000000000000">Introduction</a></h1>
<p>In this paper we discuss some of the basic steps in developing successful technical trading strategies. The method involves identifying an inefficiency or irregularity in the market and then using rigorous statistical methods to track and exploit this single feature of the market. We show how to automatically generate and test optimal <em>un-rolls</em> or trades that undo (at a profit) automatically triggered technical trades. That is to say, if the first half of technical trade is specified we show how to find the other half.</p>
<p>Our technique is to use standard tools, such as kernel methods[<a href="#nonparametricStatistics">8</a>] and Markov chains[<a href="#markovChains">4</a>], to model both the efficient and the inefficient portions of the US stock markets.[<a href="#investments">6</a>]</p>
<p>The author traded profitably using some of these techniques while part of a program trading desk at Banc of America Securities.</p>
<h1><a name="SECTION00020000000000000000" id="SECTION00020000000000000000">Technical Trading</a></h1>
<p>Technical trading is a popular universe of security-trading strategies that trade using only the so-called <em>technical data</em> which are price graphs, volumes, bid/ask books and other data commonly available in market feeds.<a name="tex2html2" href="#foot21" id="tex2html2"><sup>2</sup></a>Input sources can also include external triggers based on news, RSS feeds, on-line information and corporate announcements.<a name="tex2html3" href="#foot22" id="tex2html3"><sup>3</sup></a>These strategies are very attractive in that that are quantifiable, easy to implement and easy to back-test on historic data. A major weakness of technical trading strategies is that they ignore deeper knowledge or analysis of the companies that are behind the securities being traded. Systems of technical trading are used both by large sophisticated hedge funds and by a varying population of day-traders.</p>
<p>Typical technical variables include price, time, volume and moving averages. It is important to know that many of these variables are really just analogies and not essential features of the market. For example: none of the variables current price, time, velocity, acceleration or inertia are real market quantities. What is traditionally called <em>current price</em> is actually the price of the last trade, which is in the past and may or may not ever be seen again. The fundamental variables of state of US stock markets are bid (best purchase price and quantity currently offered), ask (best sale price and quantity currently offered), and last trade (price and quantity). Each change of these variables is called a <em>tick</em> and can happen at any time. More detailed views include detailed bid and ask books from multiple market participants and estimates of inventory imbalance of various market makers and specialists.</p>
<p>In addition to working with the proper variables a sound strategy must also have at least two important components that we call foundation and empirical correctness. Without these components there is a large danger self-delusion and an unreliable strategy.</p>
<p>By <em>foundation</em> we mean that there are <em>a priori</em> reasons to believe that some variation of the strategy should be profitable. By ignoring the nature of the companies underling the securities being traded technical trading starts on shaky ground. In fact it is tempting to appeal to an <em>efficient market</em> hypothesis and claim that no technical trading strategy should be profitable. In some sense this is true- trades made in true ignorance expose a trader to significant risk, trading costs and pointless payment of the so-called bid-ask gap. Founded technical trading strategies are based on violations of the efficient market hypothesis- identifying situations where the market is in fact not efficient and trading into these situations. If there is no reason to suspect a market inefficiency there really is no reason to perform a technical trade. Testing numerous un-founded trading strategies is more likely to discover irrelevant anomalies in past data or discover flaws in one&#8217;s statistical procedures than it is likely to discover new valuable trading rules.[<a href="#Ioannids:2005aa">3</a>]</p>
<p>Possible market irregularities include (but are not limited to):</p>
<ul>
<li>Market Open</li>
<li>External News</li>
<li>Earnings Reports</li>
<li>M&amp;A news</li>
<li>Unusual Volume</li>
<li>Inferred state of Market Maker / Specialist state</li>
<li>Detailed Bid/Ask book.</li>
</ul>
<p>By <em>empirical correctness</em> we mean that strategy can be validated and proven on historic market data. A technical strategy can have as much mathematical pedigree as you like, but it does not make sense if it can not be mechanically implemented and proven on historic data. Many technical features are popular due to their familiarity or the quality of graphs they produce- but the true measure is how well strategies generate specific executable actions and the quantified outcomes of those actions.</p>
<p>Given an irregularity it remains to develop the trading strategy. Typically this involves an initial trade (a buy or a sell) triggered by evidence of the irregularity/inefficiency followed somewhat later by a reversal or un-rolling of the trade (selling back against an initial buy or buying back against an initial sell). If markets were perfectly efficient and instantaneous in incorporating external events this should not work- so it is important to test that there really is a repeatable market inefficacy.</p>
<p>Possible initial trading strategies could include:</p>
<ul>
<li>Selling stock into an unusual price spike (a <em>contrarian</em> strategy).</li>
<li>Buying stock immediately on news (a <em>superior connection</em> strategy).</li>
<li>Selling stock into a perceived specialist imbalance (a <em>superior knowledge</em> strategy).</li>
</ul>
<p>It would be naive to expect that a strategy that starts on a trigger and then reverses its trade blindly (say some fixed time after the trigger) is fully efficient. We must assume that other players in the market have seen effects of the trigger we traded and that their actions introduce biases and uncertainty into the market. Modeling these effects will allow us to produce a systematic <em>unrolling</em> strategy that can complete any <em>entry strategy</em> into a complete round-trip system. This systematic unrolling strategy is the subject of this writeup.</p>
<h1><a name="SECTION00030000000000000000" id="SECTION00030000000000000000">First Model</a></h1>
<h2><a name="SECTION00031000000000000000" id="SECTION00031000000000000000">The Efficient Market Hypothesis</a></h2>
<p>The efficient market hypothesis is a useful tool, even when you are attempting to find inefficient market situations. It represents the baseline you feel you have found a useful deviation from. The efficient market hypothesis has many variants but the essential content is that the market is full of <em>informed players</em> so any information is <em>already factored in to the price</em>. For example if there is publicly available information that gives a reasonable expectation that a stock should rise in the future then informed investors would purchase the stock early to be in a position to benefit from this increase. These purchases actually cause their own price-increase (by the simple laws of supply and demand) and have the effect of reducing the value of the information- as they move the price increase back in time (from the expected future change in value to the time of the anticipatory buying). This is what is meant by the phrase &#8220;already factored in.&#8221;</p>
<div align="center"><a name="fig:actual" id="fig:actual"></a><a name="47"></a></p>
<table>
<caption align="bottom"><strong>Figure 1:</strong> Dell 10-13-2006 Tick Data.</caption>
<tr>
<td>
<div align="center"><img width="500" height="402" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./morning.png" alt="Image morning"/></div>
</td>
</tr>
</table>
</div>
<p>There is a mathematical concept that captures the idea of <em>already factored in</em>: Martingales. The Martingale condition is a concept that says the expected future value is the current value. For example betting a dollar on the flip of a fair coin is a Martingale of value <img width="23" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg1.png" alt="$ \$0$"/> (the odds of winning and losing a dollar balance out). The future value may be higher or lower- but when the Martingale condition is met the average of all these value weighted by their likelihood of occurrence is equal to the current value. The <em>already factored in</em> example mentioned above shows how the many players in the market tend to establish a near-Martingale by trading in such a way to move the current price to be the expected value of the future price.</p>
<div align="center"><a name="fig:random" id="fig:random"></a><a name="198"></a></p>
<table>
<caption align="bottom"><strong>Figure 2:</strong> Graph of a <em>market-like</em> random walk.</caption>
<tr>
<td>
<div align="center"><img width="500" height="500" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./random.png" alt="Image random"/></div>
</td>
</tr>
</table>
</div>
<p>If market prices were the sum of many individual traders each with bounded-budgets who traded independently then we could apply the <em>central limit theorem</em> or <em>law of large numbers</em> and say that the market is indeed a random walk like the famous Brownian motion from physics. In fact on first inspection the market price histories (as in Figure&nbsp;<a href="#fig:actual">1</a>) indeed look very similar to graphs generated by such a random process (as in Figure&nbsp;<a href="#fig:random">2</a>).</p>
<p>As we have said: it is no coincidence that the market looks nearly like a Brownian motion. Informed trading effects tend to impart Martingale like tendencies (once the overall increase factor of the value of holding wealth is factored out). Also, if the variance of the market were much larger than that of a similar Brownian motion this would itself attract <em>channel traders</em> who would benefit by trading in and out of the excess wiggling. The point is that an efficient market is usually pretty well described by random processes that have the Martingale property (like Brownian motions or Markov chains), so these are appropriate modeling tools.</p>
<p>If the market process really were such a random walk than there would be little point in technical trading. The whole theory of Martingales was developed to precisely describe situations where bets based on collecting historic information can not work. This is often called the <em>no gambling system</em> principle and it can be actually proven for systems like Martingales, unbiased Markov chains, drift-free Brownian motion and was even used as an foundational concept to define randomness by von Mises.[<a href="#vonMises">7</a>] However, traders have a large number of pervasive dependencies. Dependencies can be shared information, <em>herd mentality</em> or shared trading practices. There are also some traders with very large budgets, so the conditions commonly needed to apply the law of large numbers do not apply and it is not inevitable that the market is indeed a Brownian motion. In fact one can show that even though the market overall looks very much like a Brownian motion it has too many events that would be considered very rare in this model (crashes, run-ups, events correlated in time) to have plausibly been generated by such a model.</p>
<h2><a name="SECTION00032000000000000000" id="SECTION00032000000000000000">Exploiting Inefficiency</a></h2>
<p>A basic rule of thumb is: without a good reason to believe contrary you are not too far off assuming the market is efficient. So we decided to model the morning market as being nearly memoryless. That is we modeled it as if future prices depend only on the most recent price and not on the detailed history of prices. We will, however, condition the model on the bias introduced by the presence initial trade trigger.</p>
<p>The most basic memoryless model is the Markov Chain. In this model the world has finite number of situations called <em>states</em>. For example we could say the stock price being near each a number of price differences from the previous day&#8217;s close is a state. We could take our states to be: <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg2.png" alt="$ +0.50\%$"/> , <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg2.png" alt="$ +0.50\%$"/> , <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg3.png" alt="$ +0.25\%$"/> , <img width="53" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg4.png" alt="$ 0.00\%$"/> , <img width="53" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg5.png" alt="$ 0.25\%$"/> , <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg6.png" alt="$ -0.50\%$"/> . If our strategy involved an end of day sale followed by a next-day buy-back then knowing which state we are in allows us to assign a value to buying back the stock while in that state. This would be the negative of the relative change in stock price (price decreases work for us) times the value of the stock sold the day before (minus trading costs). If we modeled round-trip trading costs as <img width="56" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg7.png" alt="$ \$20.00$"/> and assume our triggered trade purchased a total value of <img width="69" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg8.png" alt="$ \$46,000$"/> of Dell then we could map buying back in each possible state to a net dollar value of the round trip. For instance buying back in the state <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg2.png" alt="$ +0.50\%$"/> would represent a net-loss of <img width="42" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg9.png" alt="$ \$250$"/> . We actually want to make the states a bit more detailed by adding a notion of time. If we modeled time in 5-minute intervals and (for the sake of diagram clarity) assumed that we only move up or down one state-level the Markov that modeled the first 15 minutes of the market could be represented in a diagram as in Figure&nbsp;<a href="#fig:chain1">3</a>.</p>
<div align="center"><a name="fig:chain1" id="fig:chain1"></a><a name="74"></a></p>
<table>
<caption align="bottom"><strong>Figure 3:</strong> Markov Chain Model</caption>
<tr>
<td>
<div align="center"><img width="505" height="400" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Chain1.png" alt="Image Chain1"/></div>
</td>
</tr>
</table>
</div>
<p>Each circle represents a state and each arrow represents a transition from state to state. We would use historic market data to find for every stock in this situation the relative frequency each transition is taken. For instance we would measure in our historic data what fraction of the time a stock that is 5 minutes and in the <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg3.png" alt="$ +0.25\%$"/> state moves to the <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg2.png" alt="$ +0.50\%$"/> state at the 10 minute mark. These learned state transition probabilities can be made to depend on factors from the previous day close (% increase, volume, market-capitalization) . In the diagram we are going to assume all transitions are equally likely except for the arrows with square bases which we each take to be twice as likely as each regular arrow leaving the same state. The success of our strategy depends on finding situations where our model predicts these sort of advantageous asymmetric conditions. Without these asymmetries (greater net propensity for price decrease than for price increase) we would be in a gambling situation where no strategy could possibly have net-positive value.</p>
<p>The diagram also encodes another assumption of the problem- we have a deadline for buying back the stock. In this case the diagram indicates a forced buy-back at time +15 minutes if a buy-back has not been made before that time. In reality many more levels and many more time intervals are modeled. Also note we have made the top row (representing maximal loss) absorbing. This is introducing a deliberate pessimistic flaw into the model (or equivalently adds a stop-loss condition to the strategy). We do not want the maximal loss states to have a reflected barrier (like the maximal profit states do) as this would make the model overly optimistic. Instead we force the model to be pessimistic and chose enough levels so that the maximum loss bound is not often achieved and therefor does not have large effect on the model.</p>
<div align="center"><a name="fig:chain2" id="fig:chain2"></a><a name="81"></a></p>
<table>
<caption align="bottom"><strong>Figure 4:</strong> Valuing Interior States</caption>
<tr>
<td>
<div align="center"><img width="505" height="400" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Chain2.png" alt="Image Chain2"/></div>
</td>
</tr>
</table>
</div>
<p>What we want to know is the net-value of being short (having sold) the stock the evening before. This is represented by the left-most circle which does not yet have a known value. The value of this state depends both on the transition odds of the states and on the trading strategy used to buy back the stock. There is, for example, no value in reaching the price-drop states if our strategy doesn&#8217;t take advantage and buy back while in these states. So the value of the states depends both on the uncertain future behavior of the market and of the currently unspecified buy-back strategy. The neat thing about this sort of diagram and treatment is that the forced-liquidation states at the end make it possible to simultaneously find the optimal trading strategy and assign values to all of the states. For example in the next diagram we see that the value of allowing the middle state at +10 minutes <em>to ride</em> (i.e. waiting instead of buying the stock back at this time) is equal to the properly weighted average of the ending states it connects to, in this case: <!-- MATH<br />
 $\frac{1}{4}(-\$135) + \frac{1}{4}(-\$20) + \frac{1}{2} \$95 = \$8.75$<br />
 --><br />
<img width="302" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg10.png" alt="$ \frac{1}{4}(-\$135) + \frac{1}{4}(-\$20) + \frac{1}{2} \$95 = \$8.75$"/> . The value of buying-back in this states is <img width="47" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg11.png" alt="$ -\$20$"/> so the optimal strategy is to take our chances in the next time interval (see Figure&nbsp;<a href="#fig:chain2">4</a>).</p>
<p>We can repeat this sort of argument for each state in the second to last column and determine the net-value of each state under the optimal trading strategy. States whose optimal strategy is to <em>stop</em> (perform the buy-back immediately) are indicated by not having any outgoing arrows (see Figure&nbsp;<a href="#fig:chain2b">5</a>).</p>
<div align="center"><a name="fig:chain2b" id="fig:chain2b"></a><a name="98"></a></p>
<table>
<caption align="bottom"><strong>Figure 5:</strong> Propagating the Valuation</caption>
<tr>
<td>
<div align="center"><img width="505" height="400" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Chain2b.png" alt="Image Chain2b"/></div>
</td>
</tr>
</table>
</div>
<p>The procedure moves from right to left using known states to fill in decisions and values for unknown states. In fact the calculation is so simple and orderly we can encode the entire filling-in procedure in a spreadsheet table:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\begin{array}{|l|lllr|}<br />
\hline<br />
 &#038; \text{column A} &#038; \text{column B} &#038; \text{column C} &#038; \text{column D} \\<br />
\hline<br />
\text{row 1} &#038; =D1 &#038; =D1 &#038; =D1 &#038; -\$250 \\<br />
\text{row 2} &#038; =\max(D2,(B1+B2+B3)/3) &#038; =\max(D2,(C1+C2+C3)/3) &#038; =\max(D2,(D1+D2+D3)/3) &#038; -\$135 \\<br />
\text{row 3} &#038; =\max(D3,(B2+B3+2*B4)/4) &#038; =\max(D3,(C2+C3+2*C4)/4) &#038; =\max(D3,(D2+D3+2*D4)/4) &#038; -\$20 \\<br />
\text{row 4} &#038; =\max(D4,(B3+B4+2*B5)/4) &#038; =\max(D4,(C3+C4+2*C5)/4) &#038; =\max(D4,(D3+D4+2*D5)/4) &#038; \$95 \\<br />
\text{row 5} &#038; =\max(D5,(B4+B5)/2) &#038; =\max(D5,(C4+C5)/2) &#038; =\max(D5,(D4+D5)/2) &#038; \$210 \\<br />
\hline<br />
\end{array}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="1057" height="147" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg12.png" alt="\begin{displaymath} \begin{array}{\vert l\vert lllr\vert} \hline &amp; \text{column... ...(C4+C5)/2) &amp; =\max(D5,(D4+D5)/2) &amp; \$210 \ \hline \end{array}\end{displaymath}"/></div>
<p><font size="-2">.</font></p>
<p>This is in fact the same type dynamic programming[<a href="#dynamicProgramming">1</a>] method used to value options under the <em>binomial model</em>.</p>
<p>The completed diagram is shown in Figure&nbsp;<a href="#fig:chain3">6</a>.</p>
<div align="center"><a name="fig:chain3" id="fig:chain3"></a><a name="120"></a></p>
<table>
<caption align="bottom"><strong>Figure 6:</strong> Complete Valuation</caption>
<tr>
<td>
<div align="center"><img width="505" height="400" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Chain3.png" alt="Image Chain3"/></div>
</td>
</tr>
</table>
</div>
<p>For our (made up) example the net-value of round trip trade is an expected value <img width="47" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg13.png" alt="$ \$7.47$"/> profit.</p>
<p>What remains is to choose a set of conditions to base a model estimates on. We then only trade situations that have an acceptable predicted risk and reward profile.</p>
<p>To build the state transition models we collect all the historic trade data and then segregate it into groups of data that match each possible trigger condition we wish to use to help bias our system. There is a trade-off: the more detailed the list of trigger conditions the more powerful biases we can detect (things are less smeared together) but we have less data available for each possible combination of conditions and lower reliability in modeling. To address this we advocate using non-parametric or kernel methods here to average data that nearly fits the conditions to get estimates that are both detailed and reliable.</p>
<p>For example our estimate is of the form:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
P(s_1 \rightarrow s_2) \approx<br />
\frac{<br />
\sum_{training-example} wt(training-example,s_1) P(s_1 \rightarrow s_2|training-example,s_1)<br />
}{<br />
\sum_{training-example} wt(training-example,s_1) P(s_1|training-example)<br />
}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="766" height="68" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg14.png" alt="$\displaystyle P(s_1 \rightarrow s_2) \approx \frac{ \sum_{training-example} wt(... ...sum_{training-example} wt(training-example,s_1) P(s_1\vert training-example) } $"/></div>
<p>A usable <!-- MATH<br />
 $wt(training-example,s_1)$<br />
 --><br />
<img width="227" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg15.png" alt="$ wt(training-example,s_1)$"/> can be gotten from the law of conditional probability (<!-- MATH<br />
 $P(A, B) = P(A)P(B|A)$<br />
 --><br />
<img width="203" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg16.png" alt="$ P(A, B) = P(A)P(B\vert A)$"/> ), so we use <!-- MATH<br />
 $P(training-example,s_1) = P(s_1 | training-example)<br />
P(training-example)$<br />
 --><br />
<img width="649" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg17.png" alt="$ P(training-example,s_1) = P(s_1 \vert training-example) P(training-example)$"/> . Under empirical re-sampling each training example is treated as equally likely (more common situations are accounted by the fact they yield more examples in the training set) so we can use <!-- MATH<br />
 $wt(training-example,s_1) = P(s_1 | training-example)$<br />
 --><br />
<img width="466" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg18.png" alt="$ wt(training-example,s_1) = P(s_1 \vert training-example)$"/> .</p>
<p>For <!-- MATH<br />
 $P(s_1 \rightarrow s_2 | training-example)$<br />
 --><br />
<img width="264" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg19.png" alt="$ P(s_1 \rightarrow s_2 \vert training-example)$"/> we can just estimate the frequency of when we are in a <img width="56" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg20.png" alt="$ state_A$"/> near <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg21.png" alt="$ s_1$"/> how often do we see a next-state <img width="57" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg22.png" alt="$ state_B$"/> such that <!-- MATH<br />
 $state_B/state_A$<br />
 --><br />
<img width="118" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg23.png" alt="$ state_B/state_A$"/> is approximately <img width="97" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg24.png" alt="$ s-2/s-1$"/> .</p>
<p>For both of these estimates is pays to blur things a bit during the estimation procedure replacing sums of the form:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
E_{condition(x)=true}[f(x)] = \frac{<br />
\sum_{condition(x)=true} f(x)<br />
}{<br />
\sum_{condition(x)=true} 1<br />
}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="376" height="70" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg25.png" alt="$\displaystyle E_{condition(x)=true}[f(x)] = \frac{ \sum_{condition(x)=true} f(x) }{ \sum_{condition(x)=true} 1 } $"/></div>
<p>with softer forms like:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
E_{condition(x)=true}[f(x)] \approx \frac{<br />
\sum_{x} e^{-\lambda violation(x)}f(x)<br />
}{<br />
\sum_{x} e^{-\lambda violation(x)}<br />
}<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="379" height="69" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg26.png" alt="$\displaystyle E_{condition(x)=true}[f(x)] \approx \frac{ \sum_{x} e^{-\lambda violation(x)}f(x) }{ \sum_{x} e^{-\lambda violation(x)} } . $"/></div>
<h1><a name="SECTION00040000000000000000" id="SECTION00040000000000000000">A Second Model</a></h1>
<p>One thing we one might want is to use a much more detailed model of time. One way to do this is just to add more time-states to the model. This can cause problems as we now have many more transition probabilities to estimate.<a name="tex2html10" href="#foot134" id="tex2html10"><sup>4</sup></a>Suppose we wanted to switch our model from being indexed by time to being indexed by tick. Bid, Ask and Trade ticks can happen at any time and any rate so even with a trading deadline, so there is uncertainty in how many more ticks there are before the trade deadline. We can work at the tick level (without introducing too many states) by introducing a new model that has cycles in the arrow diagram (see Figure&nbsp;<a href="#fig:chain4">7</a>).</p>
<div align="center"><a name="fig:chain4" id="fig:chain4"></a><a name="140"></a></p>
<table>
<caption align="bottom"><strong>Figure 7:</strong> Recurrent Model (With Cycles)</caption>
<tr>
<td>
<div align="center"><img width="505" height="400" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Chain4.png" alt="Image Chain4"/></div>
</td>
</tr>
</table>
</div>
<p>The short vertical arrows represent the odds of moving from price-state to price-state in the same time column. The left to right dotted arrows represent the odds of being the tick that moves to the next time column. We can now estimate the transition odds from a great quantity of per-tick data giving us very reliable transition odds. We would like to fill in the values of all the states of this model (like we did in the earlier diagrams)- but the fill-in procedure will not work in the presence of cycles. States we need to fill in our given state do not yet have known values because they themselves depend on the state we are trying to value.</p>
<h2><a name="SECTION00041000000000000000" id="SECTION00041000000000000000">Linear Program Treatment</a></h2>
<p>The standard way to deal with unknown quantities that simultaneously depend on each other is to introduce variables and write down a set of simultaneous inequalities.</p>
<p>If we introduce the variables <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg27.png" alt="$ v$"/> , <!-- MATH<br />
 $a_1 \cdots a_5$<br />
 --><br />
<img width="68" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg28.png" alt="$ a_1 \cdots a_5$"/> , <!-- MATH<br />
 $b_1 \cdots b_5$<br />
 --><br />
<img width="64" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg29.png" alt="$ b_1 \cdots b_5$"/> and <!-- MATH<br />
 $c_1 \cdots c_5$<br />
 --><br />
<img width="64" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg30.png" alt="$ c_1 \cdots c_5$"/> to represent all of the unknown values in our last diagram we can quickly write down many relations we know to be true for them.</p>
<p>For example for the set of variables <img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg31.png" alt="$ c_1$"/> through <img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg32.png" alt="$ c_5$"/> we know that each state is worth at lest as much as the value of stopping in that state. This can be written as:</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
c_1  &#038; \ge &#038;  -\$250  \\<br />
c_2  &#038; \ge &#038;  -\$135  \\<br />
c_3  &#038; \ge &#038;  -\$20 \\<br />
c_4  &#038; \ge &#038;  \$95 \\<br />
c_5  &#038; \ge &#038;  \$210 \\<br />
.<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg33.png" alt="$\displaystyle c_1$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="57" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg35.png" alt="$\displaystyle -\$250$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg36.png" alt="$\displaystyle c_2$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="57" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg37.png" alt="$\displaystyle -\$135$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg38.png" alt="$\displaystyle c_3$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="47" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg39.png" alt="$\displaystyle -\$20$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg40.png" alt="$\displaystyle c_4$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="32" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg41.png" alt="$\displaystyle \$95$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg42.png" alt="$\displaystyle c_5$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="42" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg43.png" alt="$\displaystyle \$210$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="10" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg44.png" alt="$\displaystyle .$"/></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>Each state (except deadline and stop-loss states) is also worth at least the expected value of continuing one more step, which can be written as:</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
c_2  &#038; \ge &#038;  p(c_2  \rightarrow  c_2) c_2 + p(c_2  \rightarrow  c_1) c_1 + p(c_2  \rightarrow  c_3) c_3 + p(c_2 \;\text{escape}) (-\$135) \\<br />
c_3  &#038; \ge &#038;  p(c_3  \rightarrow  c_3) c_3 + p(c_3  \rightarrow  c_2) c_2 + p(c_3  \rightarrow  c_4) c_4 + p(c_3 \;\text{escape}) (-\$20) \\<br />
c_4  &#038; \ge &#038;  p(c_4  \rightarrow  c_4) c_4 + p(c_4  \rightarrow  c_3) c_3 + p(c_4  \rightarrow  c_5) c_5 + p(c_4 \;\text{escape}) \$95 \\<br />
c_5  &#038; \ge &#038;  p(c_5  \rightarrow  c_5) c_5 + p(c_5  \rightarrow  c_4) c_4 + p(c_5 \;\text{escape}) \$210 \\<br />
.<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg36.png" alt="$\displaystyle c_2$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="412" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg45.png" alt="$\displaystyle p(c_2 \rightarrow c_2) c_2 + p(c_2 \rightarrow c_1) c_1 + p(c_2 \rightarrow c_3) c_3 + p(c_2 \;$"/>escape<img width="78" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg46.png" alt="$\displaystyle ) (-\$135)$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg38.png" alt="$\displaystyle c_3$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="412" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg47.png" alt="$\displaystyle p(c_3 \rightarrow c_3) c_3 + p(c_3 \rightarrow c_2) c_2 + p(c_3 \rightarrow c_4) c_4 + p(c_3 \;$"/>escape<img width="69" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg48.png" alt="$\displaystyle ) (-\$20)$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg40.png" alt="$\displaystyle c_4$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="412" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg49.png" alt="$\displaystyle p(c_4 \rightarrow c_4) c_4 + p(c_4 \rightarrow c_3) c_3 + p(c_4 \rightarrow c_5) c_5 + p(c_4 \;$"/>escape<img width="40" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg50.png" alt="$\displaystyle ) \$95$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg42.png" alt="$\displaystyle c_5$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="289" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg51.png" alt="$\displaystyle p(c_5 \rightarrow c_5) c_5 + p(c_5 \rightarrow c_4) c_4 + p(c_5 \;$"/>escape<img width="49" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg52.png" alt="$\displaystyle ) \$210$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="10" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg44.png" alt="$\displaystyle .$"/></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>This can be re-written into matrix form where we have</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
A =<br />
\begin{bmatrix}<br />
 1 &#038;   &#038;   &#038;   &#038;   \\<br />
   &#038; 1 &#038;   &#038;   &#038;   \\<br />
   &#038;   &#038; 1 &#038;   &#038;   \\<br />
   &#038;   &#038;   &#038; 1 &#038;   \\<br />
   &#038;   &#038;   &#038;   &#038; 1 \\<br />
-P(c_2 \rightarrow c_1) &#038;  1-P(c_2 \rightarrow c_2) &#038; -P(c_2 \rightarrow c_3) &#038; &#038; \\<br />
   &#038; -P(c_3 \rightarrow c_2) &#038; 1-P(c_3 \rightarrow c_3) &#038; -P(c_3 \rightarrow c_4) &#038; \\<br />
   &#038;   &#038; -P(c_4 \rightarrow c_3) &#038; 1-P(c_4 \rightarrow c_4) &#038; -P(c_4 \rightarrow c_5) \\<br />
   &#038;   &#038;   &#038;  -P(c_5 \rightarrow c_4)  &#038; 1-P(c_5 \rightarrow c_5)<br />
\end{bmatrix},<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="734" height="226" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg53.png" alt="$\displaystyle A = \begin{bmatrix} 1 &amp; &amp; &amp; &amp; \ &amp; 1 &amp; &amp; &amp; \ &amp; &amp; 1 &amp; &amp; \ &amp; &amp;... ...5) \ &amp; &amp; &amp; -P(c_5 \rightarrow c_4) &amp; 1-P(c_5 \rightarrow c_5) \end{bmatrix}, $"/></div>
<p><!-- MATH<br />
 \begin{displaymath}<br />
b =<br />
\begin{bmatrix}<br />
-\$250 \\<br />
-\$135 \\<br />
-\$20 \\<br />
\$95 \\<br />
\$210 \\<br />
P(c_2 \;\text{escape}) (-\$135) \\<br />
P(c_3 \;\text{escape}) (-\$20) \\<br />
P(c_4 \;\text{escape}) \$95 \\<br />
P(c_5 \;\text{escape}) \$210<br />
\end{bmatrix}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="232" height="226" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg54.png" alt="$\displaystyle b = \begin{bmatrix} -\$250 \ -\$135 \ -\$20 \ \$95 \ \$21... ... \ P(c_4 \;\text{escape}) \$95 \ P(c_5 \;\text{escape}) \$210 \end{bmatrix}$"/></div>
<p>and our vector of unknowns is</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
x =<br />
\begin{bmatrix}<br />
c_1 \\<br />
c_2 \\<br />
c_3 \\<br />
c_4 \\<br />
c_5<br />
\end{bmatrix}.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="90" height="135" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg55.png" alt="$\displaystyle x = \begin{bmatrix} c_1 \ c_2 \ c_3 \ c_4 \ c_5 \end{bmatrix}. $"/></div>
<p>In matrix form we say <img width="62" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg56.png" alt="$ A x \ge b$"/> . We are assuming we have estimates for all of the entries of <img width="18" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg57.png" alt="$ A$"/> and <img width="12" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg58.png" alt="$ b$"/> &#8211; so the only unknowns are the entries of <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg59.png" alt="$ x$"/> . If these were equalities (instead of inequalities) we would call this a set of simultaneous equations and we could use linear algebra to solve for the unknown values. Because they are inequalities we will have to instead solve what is known as a linear program.[<a href="#linProg">5</a>] It turns out the optimal values for <!-- MATH<br />
 $c_1, \cdots c_5$<br />
 --><br />
<img width="69" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg60.png" alt="$ c_1, \cdots c_5$"/> are given by solving:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\min 1\cdot x \;\text{s.t.}\;\\A x \ge b .<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="78" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg61.png" alt="$\displaystyle \min 1\cdot x \;$"/>s.t.<img width="68" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg62.png" alt="$\displaystyle \;\\ A x \ge b . $"/></div>
<p>This has an admittedly strange form (the objective condition <!-- MATH<br />
 $\min 1\cdot x$<br />
 --><br />
<img width="72" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg63.png" alt="$ \min 1\cdot x$"/> seems very arbitrary and one would at first think the likely form is <!-- MATH<br />
 $\max p \cdot x$<br />
 --><br />
<img width="76" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg64.png" alt="$ \max p \cdot x$"/> where <img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg65.png" alt="$ p$"/> is the vector probabilities of getting into each <img width="12" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg66.png" alt="$ c$"/> -state). There is also the issue that we merely wrote down inequalities that we knew would be true for the optimal solution to the stopping problem, but we have not guaranteed that there are not more conditions we have not thought of (i.e. these conditions are necessary, but we have not yet established that they are sufficient).</p>
<p>We show (in the appendix) that this is in fact the right procedure for solving for all of the <img width="12" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg66.png" alt="$ c$"/> -values. Each of these linear programs can be quickly solved using standard software. We can also see that the same type of procedure can then be applied to the <img width="12" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg58.png" alt="$ b$"/> -values (which depend only on <img width="12" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg66.png" alt="$ c$"/> -values, which are by this point known). In fact we can substitute back (using linear programs instead of filling-in) until we know <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg27.png" alt="$ v$"/> the expected value (under the model) of the entire round-trip trade.</p>
<h2><a name="SECTION00042000000000000000" id="SECTION00042000000000000000">More on the Transition Probability Estimate</a></h2>
<p>We can augment our state to carry more information that just the current ask-price relative to our previous night&acirc;&euro;&trade;s sale</p>
<p>If we are in <img width="79" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg67.png" alt="$ stage-b$"/> of our Markov model we can modify <!-- MATH<br />
 $wt(training-example,s_1)$<br />
 --><br />
<img width="227" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg15.png" alt="$ wt(training-example,s_1)$"/> to be: <!-- MATH<br />
 $P(s_1 | training-example)<br />
P(training-example | todayâ¬"s \; stage-a \; move \; summary)$<br />
 --><br />
<img width="683" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg68.png" alt="$ P(s_1 \vert training-example) P(training-example \vert today&acirc;&euro;&trade;s \; stage-a \; move \; summary)$"/> (to do this we build an estimated transition matrix for <img width="81" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg69.png" alt="$ stage-a$"/> from only the trajectory of today&acirc;&euro;&trade;s stock and then evaluate how likely the trajectory the training example from the past is under this model, much smoothing/blurring is required to make this calculation usable). Even better: we can group training data and use Bayes&acirc;&euro;&trade; law: <!-- MATH<br />
 $P(training-group | todayâ¬"s \; stage-a \; move \; summary) = P(todayâ¬"s \; stage-a \; move \; summary | training-group) P(training-group) /<br />
P(todayâ¬"s \; stage-a \; move \; summary)$<br />
 --><br />
<img width="1399" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg70.png" alt="$ P(training-group \vert today&acirc;&euro;&trade;s \; stage-a \; move \; summary) = P(today&acirc;&euro;&trade;s ... ... training-group) P(training-group) / P(today&acirc;&euro;&trade;s \; stage-a \; move \; summary)$"/></p>
<p>This allows us to group the training examples (on a few criteria, like less than a month old or not, trading volume, volatility &#8230;) and use a group of examples to build a model to evaluate today&acirc;&euro;&trade;s moves against (aggregated data to form model to check today&acirc;&euro;&trade;s single trajectory). As is traditional in Bayes estimates we ignore the denominator as it does not vary as a function of training group.</p>
<h1><a name="SECTION00050000000000000000" id="SECTION00050000000000000000">Conclusion</a></h1>
<p>We have demonstrated some of the methods of using standard statistical and optimization techniques to automatically generate and back-test <em>un-roll</em> trades that turn properly conditioned technical trades into profitable round-trip trades. What we have presented is the technical machinery for building the <em>second half</em> of a profitable trade pair where the first half is some technical signal such as price or a market external trigger.</p>
<h2><a name="SECTION00060000000000000000" id="SECTION00060000000000000000">Bibliography</a></h2>
<dl compact>
<dt><a name="dynamicProgramming" id="dynamicProgramming">1</a></dt>
<dd>B<small>ELLMAN,</small> R.<br />
<em>Dynamic Programming</em>.<br />
Dover Publications, 2003.</dd>
<dt><a name="stopping" id="stopping">2</a></dt>
<dd>B<small>REIMAN,</small> L.<br />
<em>Stopping Rule Problems</em>.<br />
John Wiley &amp; sons, 1964, ch.&nbsp;Applied Combinatorial Mathematics.</dd>
<dt><a name="Ioannids:2005aa" id="Ioannids:2005aa">3</a></dt>
<dd>I<small>OANNIDS,</small> J. P.&nbsp;A.<br />
Why most published research findings are false.<br />
<em>PLOS Medicine 2</em>, 8 (Aug 2005), 0697-0701.</dd>
<dt><a name="markovChains" id="markovChains">4</a></dt>
<dd>K<small>EMENY,</small> J.&nbsp;G., <small>AND</small> S<small>NELL,</small> J.&nbsp;L.<br />
<em>Finite Markov Chains</em>.<br />
Springer, 1960.</dd>
<dt><a name="linProg" id="linProg">5</a></dt>
<dd>S<small>CHRIJVER,</small> A.<br />
<em>Theory of Linear and Integer Programming</em>.<br />
John Wiley &amp; sons, 1986.</dd>
<dt><a name="investments" id="investments">6</a></dt>
<dd>S<small>HARPE,</small> W., A<small>LEXANDER,</small> G.&nbsp;J., <small>AND</small> B<small>AILLY,</small> J.&nbsp;W.<br />
<em>Investments</em>, 6&nbsp;ed.<br />
Prentice Hall, 1998.</dd>
<dt><a name="vonMises" id="vonMises">7</a></dt>
<dd><small>VON</small> M<small>ISES,</small> R.<br />
<em>Probability, Statistics and Truth</em>.<br />
Dover Publications, 1981.</dd>
<dt><a name="nonparametricStatistics" id="nonparametricStatistics">8</a></dt>
<dd>W<small>ASSERMAN,</small> L.<br />
<em>All of Nonparametric Statistics</em>.<br />
Springer, 2006.</dd>
</dl>
<div align="center"><b>APPENDIX</b></div>
<h1><a name="SECTION00070000000000000000" id="SECTION00070000000000000000">Why the Linear Program Solution is Correct</a></h1>
<p>How do we know the linear program solves the original problem?</p>
<ul>
<li>Because there are a lot of formulas?</li>
<li>Linear program looks kind-of right?</li>
<li>Works on a few examples?</li>
</ul>
<p>To actually prove correctness we need to derive and compare to some representations of the optimal solution. All of the inequalities we wrote must be true for the optimal solution- but we have no prior guarantee that these are the only conditions. Their could be additional conditions that we forgot to model.</p>
<p>Breiman[<a href="#stopping">2</a>] presented a clever argument technique that exploits the particularly nice structure of solutions of this problem. He noticed that solutions have both a lattice like structure (you can combine solutions by taking minimums) and an operator structure (applying the probability transition matrix and stopping rules to a solution yields a solution). It turns out this is too much well behaved structure for any non-trivial solution set to have and it lets us show that optimal solutions are essentially unique which in turn lets us show the linear program solution solves the actual trading problem.</p>
<div><a name="thm:stopping" id="thm:stopping"><b>Theorem 1</b></a> &nbsp; <i>Assume that every state in the Markov chain has a path to a forced stopping state. Let <img width="18" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg71.png" alt="$ T$"/> be a maximal optimal set of stopping nodes and define the vector <img width="11" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg72.png" alt="$ t$"/> such that <!-- MATH<br />
 $t_i = E[stopping\; value\; under\; T\; rules \;|\; started \; at \; i]$<br />
 --><br />
<img width="418" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg73.png" alt="$ t_i = E[stopping\; value\; under\; T\; rules \;\vert\; started \; at \; i]$"/> . Let <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg59.png" alt="$ x$"/> be an optimal feasible solution to the linear program:</i></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
\min 1 \cdot x &#038; &#038; \\<br />
x &#038; \ge &#038; stop\\<br />
(I-P)x &#038; \ge &#038; 0<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="72" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg74.png" alt="$\displaystyle \min 1 \cdot x$"/></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="15" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg75.png" alt="$\displaystyle x$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="38" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg76.png" alt="$\displaystyle stop$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="77" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg77.png" alt="$\displaystyle (I-P)x$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/><br />
<i>where <img width="14" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg78.png" alt="$ I$"/> is the identity matrix, <img width="19" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg79.png" alt="$ P$"/> is the matrix of transition odds of the Markov chain and <img width="38" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg80.png" alt="$ stop$"/> is the vector of stopping values.</i></p>
<p><i>Then <img width="47" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg81.png" alt="$ x = t$"/> .</i></p>
</div>
<p>The theorem says if <img width="11" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg72.png" alt="$ t$"/> is an optimal solution for the original valuation problem (that we may or may not know how to calculate) and <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg59.png" alt="$ x$"/> is an optimal feasible solution to the linear program (which is now written in a slightly different but equivalent form) then <img width="47" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg81.png" alt="$ x = t$"/> . So, as hoped, solving the linear program is equivalent to solving the original stopping problem. The extra condition of every state being able to eventual reach a forced stopping state is true in our formulation due to the trading deadline.</p>
<p>The proof gets a little involved but the essential ideas are as follows:</p>
<ul>
<li>Check an optimal stopping solution would obey the linear program inequalities (so they are necessary, still need to show they are sufficient).</li>
<li>Show that the linear program solution even if it did differ from the optimal stopping solution can not be less than the optimal stopping solution in any coordinate (this is the lattice minimum step).</li>
<li>Use the fact that every state has a path to a forced stopping state to show that the linear programing solution can not hide any excess value above best possible stopping value away from the rest of the system (this is the operator step).</li>
</ul>
<div><i>Proof</i>. [Proof of Theorem&nbsp;<a href="#thm:stopping">1</a>] The theory of linear programming duality says that there is a <em>dual problem</em> to our linear program and this dual is: <!-- MATH<br />
 $\max u \cdot stop$<br />
 --><br />
<img width="101" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg83.png" alt="$ \max u \cdot stop$"/> where</p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
u, v  &#038; \ge &#038;  0 \\<br />
(u v) A &#038; = &#038; c<br />
.<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="33" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg84.png" alt="$\displaystyle u, v$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="53" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg85.png" alt="$\displaystyle (u v) A$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg86.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="18" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg87.png" alt="$\displaystyle c .$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/><br />
The point of the dual is it is known that for all <img width="52" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg88.png" alt="$ x, u, v$"/> feasible we have <!-- MATH<br />
 $u \cdot stop \le c \cdot x$<br />
 --><br />
<img width="121" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg89.png" alt="$ u \cdot stop \le c \cdot x$"/> . And for optimal <img width="52" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg88.png" alt="$ x, u, v$"/> we have <!-- MATH<br />
 $u \cdot stop = c \cdot x$<br />
 --><br />
<img width="120" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg90.png" alt="$ u \cdot stop = c \cdot x$"/> .</p>
<p>Take <img width="52" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg88.png" alt="$ x, u, v$"/> as an optimal solution to the linear program and the dual.</p>
<p>One can check <img width="11" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg72.png" alt="$ t$"/> itself must obey all of the conditions of the linear program so duality theory tells us <!-- MATH<br />
 $u \cdot stop \le c \cdot t$<br />
 --><br />
<img width="117" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg91.png" alt="$ u \cdot stop \le c \cdot t$"/> .</p>
<p>Define a vector <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg92.png" alt="$ z$"/> such that <!-- MATH<br />
 $z_i = \min(x_i, t_i)$<br />
 --><br />
<img width="126" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg93.png" alt="$ z_i = \min(x_i, t_i)$"/> . <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg92.png" alt="$ z$"/> also obeys the primal linear program inequalities, so we know <!-- MATH<br />
 $u \cdot stop \le c \cdot z$<br />
 --><br />
<img width="120" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg94.png" alt="$ u \cdot stop \le c \cdot z$"/> . Now <!-- MATH<br />
 $u \cdot stop = c \cdot t$<br />
 --><br />
<img width="116" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg95.png" alt="$ u \cdot stop = c \cdot t$"/> so we have <!-- MATH<br />
 $c \cdot t \le c \cdot z$<br />
 --><br />
<img width="90" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg96.png" alt="$ c \cdot t \le c \cdot z$"/> . Each entry of <img width="12" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg66.png" alt="$ c$"/> is <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg97.png" alt="$ 1$"/> and <!-- MATH<br />
 $z_i \le t_i$<br />
 --><br />
<img width="56" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg98.png" alt="$ z_i \le t_i$"/> for all <img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> which can only mean that <img width="46" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg100.png" alt="$ z = t$"/> . This means entry by entry we have <!-- MATH<br />
 $x_i \ge t_i$<br />
 --><br />
<img width="58" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg101.png" alt="$ x_i \ge t_i$"/> .</p>
<p>Now define the vector function <img width="34" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg102.png" alt="$ F()$"/> such that <!-- MATH<br />
 $F(w)_i = \max(stop_i, \sum_j P(i \rightarrow j) w_j)$<br />
 --><br />
<img width="300" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg103.png" alt="$ F(w)_i = \max(stop_i, \sum_j P(i \rightarrow j) w_j)$"/> . For the true solution <img width="11" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg72.png" alt="$ t$"/> we have <img width="72" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg104.png" alt="$ F(t) = t$"/> . The linear program solution <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg59.png" alt="$ x$"/> also has <img width="80" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg105.png" alt="$ F(x) = x$"/> . Now if we suppose <img width="47" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg106.png" alt="$ x \neq t$"/> then there exists an <img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> such that <img width="56" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg107.png" alt="$ x_i - t_i$"/> is maximal and state <img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> points to at least one state <img width="13" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg108.png" alt="$ j$"/> such that <!-- MATH<br />
 $x_i - t_i > x_j &#8211; t_j$<br />
 &#8211;><br />
<img width="136" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg109.png" alt="$ x_i - t_i &gt; x_j - t_j$"/> . This must be true because none of these maximal difference states can be forced stopping states. So some maximal difference state must have a transition to a non maximal difference, otherwise this would violate the fact that all states have eventual paths to forced stopping states (where <!-- MATH<br />
 $x_k - t_k = 0$<br />
 --><br />
<img width="96" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg110.png" alt="$ x_k - t_k = 0$"/> ). For this particular <img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> we claimed <!-- MATH<br />
 $x_i > t_i \ge stop_i$<br />
 &#8211;><br />
<img width="123" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg111.png" alt="$ x_i &gt; t_i \ge stop_i$"/> so we have:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
(F(x) - F(t))_i =<br />
\left\{<br />
\begin{array}{l l}<br />
   \sum_j P(i \rightarrow j) x_j - stop_i &#038; \quad \text{if $\sum_j P(i \rightarrow j) t_j < stop_i$} \\<br />
   \sum_j P(i \rightarrow j) (x_j - t_j) &#038; \quad \text{otherwise}<br />
\\\end{array} \right.<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="592" height="55" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg112.png" alt="\begin{displaymath} (F(x) - F(t))_i = \left\{ \begin{array}{l l} \sum_j P(i \ri... ... (x_j - t_j) &amp; \quad \text{otherwise} \\ \end{array} \right. . \end{displaymath}"/></div>
<p>So either way we have for this particular <img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> : <!-- MATH<br />
 $(F(x) - F(t))_i \le \sum_j P(i \rightarrow j) (x_j - t_j)$<br />
 --><br />
<img width="323" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg113.png" alt="$ (F(x) - F(t))_i \le \sum_j P(i \rightarrow j) (x_j - t_j)$"/> . But we must have <!-- MATH<br />
 $\sum_j P(i \rightarrow j) (x_j - t_j) < x_i - t_i$<br />
 --><br />
<img width="255" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg114.png" alt="$ \sum_j P(i \rightarrow j) (x_j - t_j) &lt; x_i - t_i$"/> because <!-- MATH<br />
 $\sum_j P(i \rightarrow j) = 1$<br />
 --><br />
<img width="143" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg115.png" alt="$ \sum_j P(i \rightarrow j) = 1$"/> , <!-- MATH<br />
 $x_j - t_j \le x_i - t_i$<br />
 --><br />
<img width="136" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg116.png" alt="$ x_j - t_j \le x_i - t_i$"/> for all <img width="13" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg108.png" alt="$ j$"/> and <!-- MATH<br />
 $x_j - t_j < x_i - t_i$<br />
 --><br />
<img width="136" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg117.png" alt="$ x_j - t_j &lt; x_i - t_i$"/> for at least one <img width="13" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg108.png" alt="$ j$"/> . So <!-- MATH<br />
 $(F(x) - F(t))_i < x_i - t_i$<br />
 --><br />
<img width="200" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg118.png" alt="$ (F(x) - F(t))_i &lt; x_i - t_i$"/> and we see <img width="34" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg102.png" alt="$ F()$"/> is essentially a contraction on the segment between <img width="11" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg72.png" alt="$ t$"/> and <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg59.png" alt="$ x$"/> . Since a contraction on a bounded interval can not have two distinct fixed points our supposition that <img width="47" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg106.png" alt="$ x \neq t$"/> is untenable and we know <img width="47" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg81.png" alt="$ x = t$"/> . <img width="19" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg82.png" alt="$ \qedsymbol$"/></div>
<p>We are done- we have shown there is essentially only one optimal solution to the stopping problem (the only possible variation is rules that differ in what they do for states-<img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> such that <!-- MATH<br />
 $\sum_j P(i \rightarrow j) t_j = stop_i$<br />
 --><br />
<img width="187" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg119.png" alt="$ \sum_j P(i \rightarrow j) t_j = stop_i$"/> ). We also should by now have some insight as to why we used a linear program like <!-- MATH<br />
 $\min 1\cdot x \;\text{s.t.}\; A x \ge b$<br />
 --><br />
<img width="78" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg120.png" alt="$ \min 1\cdot x \;$"/>s.t.<img width="68" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg121.png" alt="$ \; A x \ge b$"/> : the linear program is solving for the minimum value at each state that does not dip below the expected value of neighboring states.</p>
<p></p>
<hr />
<h4>Footnotes</h4>
<dl>
<dt><a name="foot10" id="foot10">&#8230; Mount</a><a href="#tex2html1"><sup>1</sup></a></dt>
<dd>http://www.mzlabs.com/</dd>
<dt><a name="foot21" id="foot21">&#8230; feeds.</a><a href="#tex2html2"><sup>2</sup></a></dt>
<dd>To emphasize; by technical trades we mean trades based on market data (as opposed to fundamental analysis) we do not include popular culture uses of the term such as candlesticks, Eliot waves and so on.</dd>
<dt><a name="foot22" id="foot22">&#8230; announcements.</a><a href="#tex2html3"><sup>3</sup></a></dt>
<dd>We are assuming that these triggers can be made automatic by using a labeled information service or natural language processing techniques.</dd>
<dt><a name="foot134" id="foot134">&#8230; estimate.</a><a href="#tex2html10"><sup>4</sup></a></dt>
<dd>The explosion of states can be managed by adding some regularity conditions on how transition probability estimates are allowed to change over time. This serves to reduce the complexity or rank of the estimation problem and improves the generalization ability of the model.</dd>
</dl>
<p>Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2007/06/new-paper/' rel='bookmark' title='New Paper'>New Paper</a></li>
<li><a href='http://www.win-vector.com/blog/2008/05/betting-best-of-series/' rel='bookmark' title='Betting Best-Of Series'>Betting Best-Of Series</a></li>
<li><a href='http://www.win-vector.com/blog/2009/03/what-does-the-market-think/' rel='bookmark' title='What does the Market Think?'>What does the Market Think?</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Paper</title>
		<link>http://www.win-vector.com/blog/2007/06/new-paper/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=new-paper</link>
		<comments>http://www.win-vector.com/blog/2007/06/new-paper/#comments</comments>
		<pubDate>Tue, 19 Jun 2007 02:13:58 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Expository Writing]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Quantitative Finance]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[AdSense]]></category>
		<category><![CDATA[AdSense Channel]]></category>
		<category><![CDATA[Channel Ids]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Technical Papers]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/2007/06/18/new-paper/</guid>
		<description><![CDATA[author: John Mount Nina and I just finished up our analysis of some of the statistical difficulties encountered by users of Google AdSense. It came out a bit long- but we found the right statistical reference to prove that there are real barriers to understanding in this market. The paper is most legible in PDF, [...]
Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/' rel='bookmark' title='Paper on stock trading'>Paper on stock trading</a></li>
<li><a href='http://www.win-vector.com/blog/2008/05/betting-best-of-series/' rel='bookmark' title='Betting Best-Of Series'>Betting Best-Of Series</a></li>
<li><a href='http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/' rel='bookmark' title='A Demonstration of Data Mining'>A Demonstration of Data Mining</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>author: John Mount</p>
<p>Nina and I just finished up our analysis of some of the statistical difficulties encountered by users of Google AdSense. It came out a bit long- but we found the right statistical reference to prove that there are real barriers to understanding in this market. The paper is most legible in <a href="http://www.win-vector.com/SelectedPapers/files/ComparingApplesAndOrangesProblemsWithAdsense.pdf">PDF</a>, but we also include an HTML version so the blog entry can be skimmed.</p>
<p><span id="more-6"></span></p>
<h1 align="center">Comparing Apples and Oranges: Two Examples of the Limits of Statistical Inference, With an Application to Google Advertising Markets</h1>
<p align="center"><strong>John Mount<a name="tex2html1" href="#foot8"><sup>1</sup></a>, Nina Zumel<a name="tex2html2" href="#foot9"><sup>2</sup></a></strong></p>
<p></p>
<p align="center"><b>Date:</b> July 6, 2007</p>
<hr />
<h1><a name="SECTION00010000000000000000">Overview</a></h1>
<p>Bad experimental situations are often a source of great statistical puzzles. We are going to describe an example of this sort of situation using what one author observed while watching a few different companies using the Google AdSense and AdWords products.</p>
<p>The points we argue will be obvious to statisticians &#8211; in fact, they are actually elementary exercises. We will show that the measurements allowed in the Google AdSense markets are insufficient to allow accurate tracking of a large number of different revenue sources.</p>
<p>Our goal is to explain a well known limit on inference to a larger non-specialist audience. This is a bit of a challenge as most mathematical papers can only be read by people who could have written the paper themselves. By &#8220;non-specialist audience&#8221; we mean analytically minded people that may not have seen this sort of math before, or those who have seen the theory but are interested in seeing a complete application. We will include in this writeup the notes, intents, side-thoughts and calculations that mathematicians produce to understand even their own work but, as Gian-Carlo Rota wrote, we are compelled to delete for fear our presentation and understanding won&#8217;t appear as deep as everyone else&#8217;s.[<a href="#rota:1997a">4</a>]</p>
<p>The counter-intuitive points that we wish to emphasize are:</p>
<ul>
<li>The difficulty of estimating the variance of individuals from a small number of aggregated measurements.</li>
<li>The difficulty of estimating the averages of many groups from a small number of aggregated measurements.</li>
</ul>
<p>These points will be motivated as they apply in the Google markets and we will try to examine their consequences in a simplified setting.</p>
<p></p>
<h2><a name="SECTION00020000000000000000">Contents</a></h2>
<p><!--Table of Contents--></p>
<ul>
<li><a name="tex2html37" href="#SECTION00010000000000000000">Overview</a></li>
<li><a name="tex2html38" href="#SECTION00030000000000000000">The Google Markets</a>
<ul>
<li><a name="tex2html39" href="#SECTION00031000000000000000">Introduction</a></li>
<li><a name="tex2html40" href="#SECTION00032000000000000000">Information Limits</a></li>
<li><a name="tex2html41" href="#SECTION00033000000000000000">Channel Identifiers</a></li>
</ul>
<p></li>
<li><a name="tex2html42" href="#SECTION00040000000000000000">The Statistics</a>
<ul>
<li><a name="tex2html43" href="#SECTION00041000000000000000">The Variance is Not Measurable</a></li>
<li><a name="tex2html44" href="#SECTION00042000000000000000">Trying to Undo a Mixture</a></li>
</ul>
<p></li>
<li><a name="tex2html45" href="#SECTION00050000000000000000">Other Solution Methods</a></li>
<li><a name="tex2html46" href="#SECTION00060000000000000000">Conclusion</a></li>
<li><a name="tex2html47" href="#SECTION00070000000000000000">Bibliography</a></li>
<li><a name="tex2html48" href="#SECTION00080000000000000000">Appendix</a>
<ul>
<li><a name="tex2html49" href="#SECTION00081000000000000000">Derivation That a Single Mean is Easy to Estimate</a></li>
<li><a name="tex2html50" href="#SECTION00082000000000000000">Fisher Information and the Cramer-Rao Inequality</a></li>
</ul>
</li>
</ul>
<p><!--End of Table of Contents--></p>
<h1><a name="SECTION00030000000000000000">The Google Markets</a></h1>
<h2><a name="SECTION00031000000000000000">Introduction</a></h2>
<p>Google both buys and sells a large number of textual advertisements through programs called Google AdSense and Google AdWords.[<a href="#goog1">2</a>] What is actually purchased and sold is &#8220;clicks.&#8221; Web sites that agree to display Google AdSense are paid when users click on these ads, and advertisers who place advertisements into Google AdWords pay Google when their advertisements are clicked on. The key item in these markets is the &#8220;search term&#8221; that the advertiser chooses to bid on advertising clicks for. &#8220;Search terms&#8221; are short phrases for which an advertiser is willing to pay, in order to get a visit from a web surfer who has performed a search on that phrase. For instance a company like Panasonic might consider clicks on the search term &#8220;rugged laptop&#8221; (and the attention of the underlying web surfer) to be worth $2 to them.</p>
<p>Because Google both buys and sells advertisements they are essentially making a market. There are some unique aspects to this market in that it is not the advertisements or even page-views that are being traded, but clicks. Both Google and its affiliates serve the advertisements for free and then exchange payment only when a web surfer clicks on an advertisement. A website can &#8220;resell&#8221; advertisements by simultaneously placing ads through AdWords, and serving ads through AdSense. When a user clicks into the website via an advertisement, this costs the web site money; if, however, the user is then shown a number of other advertisements, he or she may then click out on one of them of their own free will, recouping money or perhaps even making a profit for the site. There is significant uncertainty in attempting resale and arbitrage in these advertisement markets, as the user who must be behind all the clicks can just &#8220;evaporate&#8221; during an attempted resale. Direct reselling of clicks (such as redirecting a web surfer from one advertisement to another) would require a method called &#8220;automatic redirection&#8221; to move the surfer from one advertisement to a replacement advertisement. Automatic redirection is not allowed by Google&#8217;s terms of service.</p>
<p>An interesting issue is that each click on a given search term is a unique event with a unique cost. One click for &#8220;rugged laptop&#8221; may cost $1 and another may cost $0.50. The differing costs are determined by the advertiser&#8217;s bid, available placements for the key phrase, what other advertisers are bidding in the market, how many web surfers are available, and Google&#8217;s sorting of bids. The sorting of bids by Google depends on the rank of advertiser&#8217;s bid times an adjustment factor managed by Google. The hopeful assumption is that all of the potential viewers and clickers for the same search term are essentially exchangeable in that they all have a similar (unknown) cost and similar probabilities of later actions, such as buying something from a web site. The concept of exchangeability is what allows information collected on one set of unique events to inform predictions about new unique events (drawn from the same exchangeable population).</p>
<p>Whatever the details are, these large advertisement markets have given Google an income of $12 billion, $3.5 billion in profit and 70% year to year growth in 2006.[<a href="#googval">5</a>] This scale of profit is due in part to the dominant position of Google in forming markets for on-line advertising.</p>
<p>The reasons for Google&#8217;s market domination are various and include the superior quality of the Google matching and bidding service, missteps by competitors and the network effects found in a good market &#8211; the situation whereby sellers attract buyers and buyers attract sellers. The cost of switching markets (implementation, information handling and staffing multiple relationships) are also significant factors.</p>
<p>In our opinion, Google&#8217;s profit margins are also helped by the limits on information available to most of the other market participants. In the next section, we will discuss some of the information limits or barriers to transparency in the Google market.</p>
<h2><a name="SECTION00032000000000000000">Information Limits</a></h2>
<p>Google deals are typically set up as revenue sharing arrangements in which Google agrees to pay a negotiated portion of the revenues received by Google to the AdSense hosting web site. As noted above, advertisement click-through values vary from as little as $0.05 to over $40.0 per click. It is obvious that web site operators who receive a commission to serve advertisements on behalf of the Google AdSense program need detailed information about which advertisements are paying at what rate. This is necessary both to verify that Google is sharing the correct amount on valuable advertisements and to adjust and optimize the web site hosting the advertisements.</p>
<p>However, Google does not provide AdSense participants with a complete breakdown of revenues paid. There are a number of possible legitimate reasons for this. First, there is a concern that allowing web sites complete detailed reconciliation data would allow them to over-optimize or perform so-called &#8220;keyword arbitrage&#8221; where sites buy precisely the keywords they can profitably serve advertisements on instead of buying keywords for which the site actually has useful information or services. In addition, the quantity of data is very large, so there are some technical challenges in providing a detailed timely reconciliation. There can also be reasons favorable to Google.</p>
<h2><a name="SECTION00033000000000000000">Channel Identifiers</a></h2>
<p>Google&#8217;s current solution to the conflicting informational needs defines the nature of the market and is in itself quite interesting. Google allows the AdSense customer a number of measurements called &#8220;channels.&#8221; The channels come with identifiers and the AdSense customer is allowed to attach a number of identifiers to every advertisement clicked-out on. Google in turn reports not the detailed revenue for every click-out but instead just the sum of revenue received on clicks-out containing each channel identifier.</p>
<p>For example: if a web site operator wanted to know the revenue from a particular search term (say &#8220;head cold&#8221;) they could attach a single channel identifier to all click-outs associated with &#8220;head cold&#8221; and to no other search term. Under this scheme, Google would then be reporting the revenue for the search term as a channel summary. This simple scheme uses up an entire channel-id for a single search term. This would not be a problem except that an AdSense partner is typically limited (by Google) to a few hundred channel identifiers and is often attempting to track tens of thousands of search terms (and other conditions such as traffic source and time of day). It is obvious to any statistician that these limited number of channels are not sufficient to eliminate many degrees of uncertainty in the revenue attribution problem.</p>
<p>Google does allow each click-out to have multiple channel identifiers attached to it. At first this seems promising &#8211; for instance one can easily come up with schemes where 30 channel ids would be sufficient to give over a billion unique search terms each a unique <em>pattern</em> of channel identifiers. However, Google does not report revenue for each pattern of channel identifiers; in this case they would only report the total for each of the 30 channels. Each channel total would be the sum of all revenue given for all clicks-out that included the given channel-id. Under this scheme we would have a lot of double counting in that any click-out with multiple channel identifiers attached is necessarily simultaneously contributing to multiple totals. Anyone familiar with statistics or linear algebra will quickly recognize that 30 channels can really only reliably measure about 30 facts about an ad campaign. <em>There is provably no super clever scheme capable of decoding these confounded measurements into a larger number of reliable outcomes</em>.</p>
<p>Let us go back to the points that we promised to discuss at the beginning of this paper:</p>
<ul>
<li>The difficulty of estimating the variance of individuals from a small number of aggregated measurements.
<p>In terms of Google AdSense, this means that we can tell the average (mean) value of a click in a given channel, but we cannot tell how widely the click values in the channel vary from this average value.</p>
</li>
<li>The difficulty of estimating the averages of many groups from a small number of aggregated measurements.
<p>This means that if we assign multiple search terms into each of our available channels, we cannot separate out the values of each individual search term using only the aggregate channel measurements.</p>
</li>
</ul>
<p>It is an interesting exercise to touch on the theory of why these facts are true.</p>
<h1><a name="SECTION00040000000000000000">The Statistics</a></h1>
<p>One thing the last section should have made obvious is that even describing the problem is detailed and tedious. It may be better to work in analogy to avoid real-world details and non-essential complications. Let&#8217;s replace advertisement clicks-out with fruit, and channels with weighings of baskets.</p>
<p>Suppose we are dealing with apples and our business depends on knowing the typical weight of each fruit. We assume that all apples are exchangeable: they may each have a different weight (and value) but they all are coming from a single source. We further assume that we have a limited number of times that we are allowed to place our apples into a basket and weigh them on a scale.</p>
<h2><a name="SECTION00041000000000000000">The Variance is Not Measurable</a></h2>
<h3><a name="SECTION00041100000000000000"></a><a name="sec:themean"></a><br />
The Mean</h3>
<p>The first example, the happy one, is when we have a single basket filled with many different items of one type of fruit. For instance suppose we had a single basket with 5 apples in it and we were told the basket contents have a total weight of 1.3 pounds. The fact that we were given only a single measurement for the entire basket (instead of being allowed to weigh each apple independently) does not interfere in any way with accurately deducing that the average (or mean) of this type of apple weighs a little more than 1/4 pound. If we had <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> apples in the basket, and we called the total weight of the contents of the basket <img width="18" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg2.png" alt="$ T$"/> , we could estimate the average or mean weight of individual apples as being <img width="38" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg3.png" alt="$ T/n$"/> . If we use <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> to denote the (unknown universal) average weight of individual apples we would denote our estimate of this average as <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg5.png" alt="$ \hat{a}_w$"/> and we have just said that our estimate is <!-- MATH<br />
 $\hat{a}_w = T/n$<br />
 --><br />
<img width="84" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg6.png" alt="$ \hat{a}_w = T/n$"/> .</p>
<p>However, we are missing the opportunity to learn at least one important thing: how much does the weight of these apples vary? This could be an important fact needed to run our business (apples below a given weight may be unsellable, or other weight considerations may apply). We may need to know how inaccurate is it to use the mean or average weight of the apples in place of individual weights.</p>
<p>If we were allowed 5 basket weighings we could put one apple in each basket and directly see how much the typical variation in weight is for the type of apples we have. Let&#8217;s call this <em>Experiment-A</em>. Suppose in this case we find the 5 apples to weigh <!-- MATH<br />
 $0.25lb, 0.3lb, 0.27lb, 0.23lb, 0.25lb$<br />
 --><br />
<img width="264" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg7.png" alt="$ 0.25lb, 0.3lb, 0.27lb, 0.23lb, 0.25lb$"/> respectively. This detailed set of measurements helps inform us on how this type of apple varies in weight.</p>
<p>One of the simplest methods to summarize information about variation is a statistical notion called &#8220;variance.&#8221; Variance is defined as the expected squared distance of an random individual from the population average. Variance is written as <!-- MATH<br />
 $E[(x - a_w)^2]$<br />
 --><br />
<img width="106" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg8.png" alt="$ E[(x - a_w)^2]$"/> where <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg9.png" alt="$ x$"/> is a &#8220;random variable&#8221; denoting the weight of a single apple drawn uniformly and independently at random (from the unknown larger population) and the <img width="30" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg10.png" alt="$ E[]$"/> notation denotes &#8220;expectation.&#8221; <!-- MATH<br />
 $E[(x - a_w)^2]$<br />
 --><br />
<img width="106" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg8.png" alt="$ E[(x - a_w)^2]$"/> is the value that somebody who knew the value of <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> would say is the average value of <!-- MATH<br />
 $(x - a_w)^2$<br />
 --><br />
<img width="81" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg11.png" alt="$ (x - a_w)^2$"/> over very many repetitions of drawing a single apple and recording its individual weight as <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg9.png" alt="$ x$"/> . For example if all apples had the exact same weight the variance would be zero.</p>
<p>For the basket above, <!-- MATH<br />
 $E[(x - \hat{a}_w)^2]$<br />
 --><br />
<img width="106" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg12.png" alt="$ E[(x - \hat{a}_w)^2]$"/> is calculated as:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\frac{(0.25-0.26)^2 + (0.3-0.26)^2 + (0.27-0.26)^2 + (0.23-0.26)^2 + (0.25-0.26)^2}{5}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="648" height="65" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg13.png" alt="$\displaystyle \frac{(0.25-0.26)^2 + (0.3-0.26)^2 + (0.27-0.26)^2 + (0.23-0.26)^2 + (0.25-0.26)^2}{5} $"/></div>
<p>(the <img width="38" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg14.png" alt="$ 0.26$"/> itself the average of the 5 apples weights). The interpretation is that for a similar apple with unknown weight <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg9.png" alt="$ x$"/> we would expect <!-- MATH<br />
 $(x-0.26)^2 \approx 4 * 0.00056$<br />
 --><br />
<img width="208" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg15.png" alt="$ (x-0.26)^2 \approx 4 * 0.00056$"/> or for <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg9.png" alt="$ x$"/> to not be too far outside the interval <img width="47" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg16.png" alt="$ 0.212$"/> to <img width="47" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg17.png" alt="$ 0.307$"/> (applying the common rule of thumb &#8220;2 standard deviations&#8221; which is 4 variances). As we see all of the original 5 apples fell in this interval.</p>
<p>Now the 5 apple weights we know are not actually all the possible apples in the world, they are merely the apples in our sample. There are some subtleties about using the variance found in a sample to estimate the variance of the total population, but for this discussion we will use the naive assumption that they are nearly the same. If we use the symbol <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> to denote the (unknown) true variance of individual apple weights (so <!-- MATH<br />
 $v_a = E[(x - a_w)^2]$<br />
 --><br />
<img width="149" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg19.png" alt="$ v_a = E[(x - a_w)^2]$"/> ) we can use it to express the fact <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg5.png" alt="$ \hat{a}_w$"/> is actually an excellent estimate of <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> .</p>
<p>Specifically: if we were to repeat the experiment of taking a basket of randomly selected apples (<img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> apples in the basket) over and over again, estimating the mean apple weight <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg5.png" alt="$ \hat{a}_w$"/> each time, then <!-- MATH<br />
 $E[(\hat{a}_w - a_w)^2]$<br />
 --><br />
<img width="116" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg20.png" alt="$ E[(\hat{a}_w - a_w)^2]$"/> &#8211; the expected square error between our estimate of the average apple weight and the true average apple weight &#8211; will go to zero as the sample-size <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> is increased. In fact, we can show <!-- MATH<br />
 $E[(\hat{a}_w - a_w)^2] =  v_a/n$<br />
 --><br />
<img width="179" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg21.png" alt="$ E[(\hat{a}_w - a_w)^2] = v_a/n$"/> , which means that our estimate of the mean gets more precise as <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> is increased. This fact that large samples are very good estimates of unknown means is basic- but for completeness we include its derivation in the appendix.</p>
<h3><a name="SECTION00041200000000000000">Trying to Estimate the Variance</a></h3>
<p>We introduced the variance of individual apples (denoted by <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> ) as an unknown quantity that aided reasoning. We know that even with only one measurement of the total weight of all <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> apples that <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg5.png" alt="$ \hat{a}_w$"/> is an estimate of the mean whose error goes to zero as the <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> (the number of apples or the sample size) gets large.</p>
<p>However, the variance of individual apples <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> is so useful that we would like to have an actual estimate (<img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg22.png" alt="$ \hat{v}_a$"/> ) of it. It would be very useful to know if <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> is near zero (all apples have nearly identical weight) or if <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> is large (apples vary wildly in weight). If we were allowed to weigh each apple as in Experiment-A (i.e. if we had an unlimited number of basket weighings or channels), we could estimate the variance by the calculations in the last section. If we were allowed only one measurement we would really have almost no information about the variance as we have only seen one aggregated measurement- so we have no idea how individual apple weights vary. The next question is: can we create a good estimate <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg22.png" alt="$ \hat{v}_a$"/> when we are allowed only two measurements but the sample size (<img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> ) is allowed to grow?</p>
<p>Lets consider <em>Experiment-B</em>: If we have a total of <img width="25" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg23.png" alt="$ 2 n$"/> apples (<img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> in each basket) and <img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg24.png" alt="$ T_1$"/> is the total weight of the first basket and <img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg25.png" alt="$ T_2$"/> is the total weight of the second basket then some algebra would tell us that <!-- MATH<br />
 $\hat{v}_a = \frac{(T_1-T_2)^2}{2 n}$<br />
 --><br />
<img width="107" height="48" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg26.png" alt="$ \hat{v}_a = \frac{(T_1-T_2)^2}{2 n}$"/> is an unbiased estimate of <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> (the variance in weight of individual apples)<a name="tex2html3" href="#foot323"><sup>3</sup></a>.</p>
<p>It turns out, however, that <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg22.png" alt="$ \hat{v}_a$"/> is actually a bad estimate of the variance. That is, the expected distance of <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg22.png" alt="$ \hat{v}_a$"/> from the unknown true value of the variance <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> (written <!-- MATH<br />
 $E[(v_a - \hat{v}_a)^2]$<br />
 --><br />
<img width="109" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg29.png" alt="$ E[(v_a - \hat{v}_a)^2]$"/> ) does not shrink beyond a certain bound as the number of apples in each basket (<img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> ) is increased. This &#8220;variance of variance estimate&#8221; result is in stark contrast to the nice behavior we just saw in estimating the average <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> . With some additional assumptions and algebra (not shown here) we can show that for our estimate <!-- MATH<br />
 $\hat{v}_a = \frac{(T_1 - T_2)^2}{2 n}$<br />
 --><br />
<img width="107" height="48" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg26.png" alt="$ \hat{v}_a = \frac{(T_1-T_2)^2}{2 n}$"/> we have <!-- MATH<br />
 $\lim_{n \rightarrow \infty} E[(\hat{v}_a - v_a)^2] = 2 v_a^2$<br />
 --><br />
<img width="226" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg30.png" alt="$ \lim_{n \rightarrow \infty} E[(\hat{v}_a - v_a)^2] = 2 v_a^2$"/> . There is a general reason this is happening, and we will discuss this in the next section.</p>
<h3><a name="SECTION00041300000000000000">Cramer-Rao: Why we can not estimate the variance of individual Apples</a></h3>
<p>Of course showing one particular calculation fails is not the same as showing that the variance of individual apples can not be estimated from the two total weighings <img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg24.png" alt="$ T_1$"/> and <img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg25.png" alt="$ T_2$"/> . There could be other, better, estimates<a name="tex2html4" href="#foot324"><sup>4</sup></a>.</p>
<p>There is a well known statistical law that states no unbiased estimator works well in this situation. The law is called the Cramer-Rao inequality.[<a href="#cove_thom_91">1</a>] The Cramer-Rao inequality is a tool for identifying situations where <em>all</em> unbiased estimators have large variance. The Cramer-Rao inequality is typically a calculation so we will add a few more (not necessarily realistic) assumptions to ease calculation. We assume apple weights are distributed normally with mean <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> and variance <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> .<a name="tex2html5" href="#foot61"><sup>5</sup></a></p>
<p>There is a quantity depending only on the experimental set up that reads off how difficult estimation is. By &#8220;depending only on the experimental set up&#8221; we mean that the quantity does not depend on any specific outcomes of <img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg24.png" alt="$ T_1$"/> , <img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg25.png" alt="$ T_2$"/> and does not depend on any specific estimation procedure or formula. This quantity is called &#8220;Fisher Information&#8221; and is denoted as <img width="48" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg31.png" alt="$ J(v_a)$"/> .</p>
<p>The Cramer-Rao inequality[<a href="#cove_thom_91">1</a>] says for any unbiased estimator <img width="14" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg32.png" alt="$ \hat{v}$"/> , the variance of <img width="14" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg32.png" alt="$ \hat{v}$"/> is at least <img width="67" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg33.png" alt="$ 1/J(v_a)$"/> . Written in formulas the conclusion of the Cramer-Rao inequality is:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
E[(v_a - \hat{v})^2] \ge 1/J(v_a)<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="195" height="41" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg34.png" alt="$\displaystyle E[(v_a - \hat{v})^2] \ge 1/J(v_a) .$"/></div>
<p>Since we have now assumed a model for the weight distribution of apples, we can derive (see appendix) the following:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
J(v_a) = \frac{2}{v_a^2}<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="100" height="59" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg35.png" alt="$\displaystyle J(v_a) = \frac{2}{v_a^2} .$"/></div>
<p>Applying the Cramer-Rao inequality lets us immediately say:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
E[(v_a - \hat{v})^2] \ge \frac{v_a^2}{2}<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="154" height="65" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg36.png" alt="$\displaystyle E[(v_a - \hat{v})^2] \ge \frac{v_a^2}{2} .$"/></div>
<p>This means that there is no unbiased estimation procedure for which can we expect the squared-error to shrink below <!-- MATH<br />
 $\frac{v_a^2}{2}$<br />
 --><br />
<img width="22" height="48" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg37.png" alt="$ \frac{v_a^2}{2}$"/> even as the number of items in each basket (<img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> ) is increased. So not only does our proposed variance estimate fail to have the (expected) good behavior we saw when estimating the mean, but in fact no unbiased estimating scheme will work. In general we can show that the quality of the variance estimate is essentially a function of the number of measurements we are allowed<a name="tex2html6" href="#foot73"><sup>6</sup></a> &#8211; so any scheme using a constant number of measurements will fail.</p>
<h2><a name="SECTION00042000000000000000"></a><a name="sec:mixture"></a><br />
Trying to Undo a Mixture</h2>
<p>Suppose we are willing to give up on estimating the variance (a dangerous concession). We are still blinded by the limited number of channels if we attempt to estimate more than one individual mean.</p>
<p>In our analogy let&#8217;s introduce a second fruit (oranges) to the problem. Call an assignment of fruit to baskets a &#8220;channel design.&#8221; For example if we were allowed two basket measurements and wanted to know the mean weight of apples and the mean weight of oranges we could assign all apples to one basket and all oranges to the other. This &#8220;design&#8221; would give us very good estimates of both the mean weight of apples and the mean weight of oranges.</p>
<p>Let&#8217;s consider a simple situation where due to the limited number of channels we are attempting to measure something that was not considered in the original channel design. This is very likely because the number of simultaneous independent measurements is limited to the number of channels and it is very likely that one will have important questions that were not in any given experimental design. For example (going back to AdSense), suppose we had 26 channels and we used them all to group our search phrases by first letter of the English alphabet and we later wanted to break down older data by length of phrase.<a name="tex2html7" href="#foot75"><sup>7</sup></a> We would consider ourselves lucky if the first-letter design was even as good as random assignment of channel ids in measuring the effect of search term length.</p>
<p>To work this example we continue to ignore most of the details and suppose we really are trying to estimate the mean weight of apples and the mean weight of oranges at the same time. Due to the kind of bad luck described above we have data from an experiment that was not designed for this purpose. Let&#8217;s try the so-called easy case where we have a random experiment. For <em>Experiment-C</em> let&#8217;s suppose we have two baskets of fruit and each basket was filled with <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> -items of fruit by repeating the process of flipping a fair coin and placing an apple if the coin came up heads and an orange if the coin came up tails. This admittedly silly process is simulating the situation where we are forced to use measurements that potentially could solve our problem- but were not designed to solve it.<a name="tex2html8" href="#foot77"><sup>8</sup></a> We can measure the total weight of the contents of each basket. So the information at our disposal this time is <!-- MATH<br />
 $a_1,o_1,T_1$<br />
 --><br />
<img width="74" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg38.png" alt="$ a_1,o_1,T_1$"/> (the number of apples in the first basket, the number of oranges in the first basket and the total weight of the first basket) and <!-- MATH<br />
 $a_2,o_2,T_2$<br />
 --><br />
<img width="74" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg39.png" alt="$ a_2,o_2,T_2$"/> (the number of apples in the second basket, the number of oranges in the second basket and the total weight of the second basket). What we want to estimate are <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> and <img width="24" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg40.png" alt="$ o_w$"/> the unknown mean weights of the types of apples and types of oranges we are dealing with.</p>
<p>To simplify things a bit let&#8217;s treat the number of apples and oranges in each basket, <!-- MATH<br />
 $a_1,o_1,a_2,o_2$<br />
 --><br />
<img width="97" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg41.png" alt="$ a_1,o_1,a_2,o_2$"/> , as known constants set at &#8220;typical values&#8221; that we would expect from the coin flipping procedure. It turns out the following values of <!-- MATH<br />
 $a_1,o_1,a_2,o_2$<br />
 --><br />
<img width="97" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg41.png" alt="$ a_1,o_1,a_2,o_2$"/> are typical:</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
a_1 &#038; = &#038; n/2 + \sqrt{n} \\<br />
o_1 &#038; = &#038; n/2 - \sqrt{n} \\<br />
a_2 &#038; = &#038; n/2 - \sqrt{n} \\<br />
o_2 &#038; = &#038; n/2 + \sqrt{n}.<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="22" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg42.png" alt="$\displaystyle a_1$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="84" height="41" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg44.png" alt="$\displaystyle n/2 + \sqrt{n}$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg45.png" alt="$\displaystyle o_1$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="85" height="41" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg46.png" alt="$\displaystyle n/2 - \sqrt{n}$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="22" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg47.png" alt="$\displaystyle a_2$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="85" height="41" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg46.png" alt="$\displaystyle n/2 - \sqrt{n}$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg48.png" alt="$\displaystyle o_2$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="90" height="41" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg49.png" alt="$\displaystyle n/2 + \sqrt{n}.$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>We call these values typical because in any experiment where the distribution of <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> items in a collection is chosen by fair coin flips we expect to see a nearly even distribution (due to the fairness of the coin) but not too even (due to the randomness). In fact we really do expect any one of these values to be at least <!-- MATH<br />
 $\sqrt{n}/2$<br />
 --><br />
<img width="50" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg50.png" alt="$ \sqrt{n}/2$"/> away from <img width="34" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg51.png" alt="$ n/2$"/> most of the time and closer than <!-- MATH<br />
 $2 \sqrt{n}$<br />
 --><br />
<img width="41" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg52.png" alt="$ 2 \sqrt{n}$"/> most of the time. So these are typical values, good but not too good.</p>
<p>We illustrate how to produce an unbiased (though in the end unfortunately unusable) estimate for <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> and <img width="24" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg40.png" alt="$ o_w$"/> . The general theory says the estimate will be unreliable- but there is some value in seeing how an estimate is formed and having a specific estimate to experiment with. The fact that we know the count of each fruit in each basket, and each basket&#8217;s weight, gives us a simultaneous system of equations:</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
E_{a_1,o_1,a_1,o_2}[T_1] &#038; = &#038; a_1 a_w + o_1 o_w \\<br />
E_{a_1,o_1,a_2,o_2}[T_2] &#038; = &#038; a_2 a_w + o_2 o_w<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="113" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg53.png" alt="$\displaystyle E_{a_1,o_1,a_1,o_2}[T_1]$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="102" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg54.png" alt="$\displaystyle a_1 a_w + o_1 o_w$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="113" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg55.png" alt="$\displaystyle E_{a_1,o_1,a_2,o_2}[T_2]$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="102" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg56.png" alt="$\displaystyle a_2 a_w + o_2 o_w$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p><!-- MATH<br />
 $E_{a_1,o_1,a_1,o_2}[T_1]$<br />
 --><br />
<img width="113" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg57.png" alt="$ E_{a_1,o_1,a_1,o_2}[T_1]$"/> represents the average value of <img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg24.png" alt="$ T_1$"/> over imagined repeated experiments where <img width="22" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg58.png" alt="$ a_1$"/> apples and <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg59.png" alt="$ o_1$"/> oranges are placed in a basket and weighed (similarly for <!-- MATH<br />
 $E_{a_1,o_1,a_2,o_2}[T_2]$<br />
 --><br />
<img width="113" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg60.png" alt="$ E_{a_1,o_1,a_2,o_2}[T_2]$"/> ). The subscripts are indicating we are only considering experiments where the number of apples and oranges are known to be exactly <!-- MATH<br />
 $a_1,o_1,a_1,o_2$<br />
 --><br />
<img width="97" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg61.png" alt="$ a_1,o_1,a_1,o_2$"/> . We do not actually know <!-- MATH<br />
 $E_{a_1,o_1,a_1,o_2}[T_1]$<br />
 --><br />
<img width="113" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg57.png" alt="$ E_{a_1,o_1,a_1,o_2}[T_1]$"/> and <!-- MATH<br />
 $E_{a_1,o_1,a_2,o_2}[T_2]$<br />
 --><br />
<img width="113" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg60.png" alt="$ E_{a_1,o_1,a_2,o_2}[T_2]$"/> but we can use the specific basket total weighs <img width="50" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg62.png" alt="$ T_1,T_2$"/> we saw in our single experiment as stand-ins. In other words, <img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg24.png" alt="$ T_1$"/> may not equal <!-- MATH<br />
 $E_{a_1,o_1,a_1,o_2}[T_1]$<br />
 --><br />
<img width="113" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg57.png" alt="$ E_{a_1,o_1,a_1,o_2}[T_1]$"/> but <img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg24.png" alt="$ T_1$"/> is an unbiased estimator of <!-- MATH<br />
 $E_{a_1,o_1,a_1,o_2}[T_1]$<br />
 --><br />
<img width="113" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg57.png" alt="$ E_{a_1,o_1,a_1,o_2}[T_1]$"/> (this is a variation on the old &#8220;typical family with 2.5 children&#8221; joke). So we rewrite the previous system as estimates:</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
T_1 &#038; \approx &#038; a_1 a_w + o_1 o_w \\<br />
T_2 &#038; \approx &#038; a_2 a_w + o_2 o_w .<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg63.png" alt="$\displaystyle T_1$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg64.png" alt="$\displaystyle \approx$"/></td>
<td align="left" nowrap><img width="102" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg54.png" alt="$\displaystyle a_1 a_w + o_1 o_w$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg65.png" alt="$\displaystyle T_2$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg64.png" alt="$\displaystyle \approx$"/></td>
<td align="left" nowrap><img width="107" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg66.png" alt="$\displaystyle a_2 a_w + o_2 o_w .$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>We can the rewrite this system into a &#8220;solved form&#8221;:</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
a_w &#038; \approx &#038; \frac{o_2 T_1 - o_1 T_2}{a_1 o_2 - a_2 o_1} \\<br />
o_w &#038; \approx &#038; \frac{-a_2 T_1 + a_1 T_2}{a_1 o_2 - a_2 o_1} .<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg67.png" alt="$\displaystyle a_w$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg64.png" alt="$\displaystyle \approx$"/></td>
<td align="left" nowrap><img width="102" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg68.png" alt="$\displaystyle \frac{o_2 T_1 - o_1 T_2}{a_1 o_2 - a_2 o_1}$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="24" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg69.png" alt="$\displaystyle o_w$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg64.png" alt="$\displaystyle \approx$"/></td>
<td align="left" nowrap><img width="123" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg70.png" alt="$\displaystyle \frac{-a_2 T_1 + a_1 T_2}{a_1 o_2 - a_2 o_1} .$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>And this gives us the tempting estimates <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg5.png" alt="$ \hat{a}_w$"/> and <img width="24" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg71.png" alt="$ \hat{o}_w$"/></p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
\hat{a}_w &#038; = &#038; \frac{o_2 T_1 - o_1 T_2}{a_1 o_2 - a_2 o_1} \\<br />
\hat{o}_w &#038; = &#038; \frac{-a_2 T_1 + a_1 T_2}{a_1 o_2 - a_2 o_1} .<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg72.png" alt="$\displaystyle \hat{a}_w$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="102" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg68.png" alt="$\displaystyle \frac{o_2 T_1 - o_1 T_2}{a_1 o_2 - a_2 o_1}$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="24" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg73.png" alt="$\displaystyle \hat{o}_w$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="123" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg70.png" alt="$\displaystyle \frac{-a_2 T_1 + a_1 T_2}{a_1 o_2 - a_2 o_1} .$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p><img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg5.png" alt="$ \hat{a}_w$"/> and <img width="24" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg71.png" alt="$ \hat{o}_w$"/> are indeed unbiased estimates of <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> and <img width="24" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg40.png" alt="$ o_w$"/> .</p>
<p>The problem is: even though these are unbiased estimates- they are not good estimates. With some calculation one can show that as <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> (the number of pieces of fruit in each basket) increases that <!-- MATH<br />
 $E_{a_1,o_1,a_2,o_2}[(\hat{a}_w - a_w)^2]$<br />
 --><br />
<img width="181" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg74.png" alt="$ E_{a_1,o_1,a_2,o_2}[(\hat{a}_w - a_w)^2]$"/> and <!-- MATH<br />
 $E_{a_1,o_1,a_2,o_2}[(\hat{o}_w - o_w)^2]$<br />
 --><br />
<img width="180" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg75.png" alt="$ E_{a_1,o_1,a_2,o_2}[(\hat{o}_w - o_w)^2]$"/> do not approach zero. Our estimates have a certain built-in error bound that does not shrink even as the sample size is increased.</p>
<h3><a name="SECTION00042100000000000000">Cramer-Rao: Why we can&#8217;t separate Apples from Oranges</a></h3>
<p>What is making estimation difficult has been the same in all experiments: most of what we want to measure is being obscured. As we mentioned earlier, in a typical case all of <!-- MATH<br />
 $a_1,o_1,a_2,o_2$<br />
 --><br />
<img width="97" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg41.png" alt="$ a_1,o_1,a_2,o_2$"/> will be relatively near a common value. Any estimation procedure is going to depend on separations among these values, which are unfortunately not that big. This is what makes estimation difficult.</p>
<p>Let us assume apple weights are distributed normally with mean <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> and variance <img width="56" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg76.png" alt="$ v_a = v$"/> and orange weights are distributed normally with mean <img width="24" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg40.png" alt="$ o_w$"/> and variance <img width="56" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg77.png" alt="$ v_o = v$"/> .</p>
<p>Since we have now assumed a model for the weight distribution of apples and oranges we can derive (calculating as shown in [<a href="#cove_thom_91">1</a>]) the following:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
J(a_w,o_w) =<br />
\frac{1}{n v}<br />
\begin{bmatrix}<br />
a_1^2 + a_2^2 &#038; a_1 o_1 + a_2 o_2 \\<br />
a_1 o_1 + a_2 o_2 &#038; o_1^2 + o_2^2 \\<br />
\end{bmatrix}<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="359" height="66" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg78.png" alt="$\displaystyle J(a_w,o_w) = \frac{1}{n v} \begin{bmatrix} a_1^2 + a_2^2 &amp; a_1 o_1 + a_2 o_2 \ a_1 o_1 + a_2 o_2 &amp; o_1^2 + o_2^2 \ \end{bmatrix} .$"/></div>
<p>What we are really interested in is the inverse of <!-- MATH<br />
 $J(a_w,o_w)$<br />
 --><br />
<img width="80" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg79.png" alt="$ J(a_w,o_w)$"/> , which (for or typical values of <!-- MATH<br />
 $a_1,o_1,a_2,o_2$<br />
 --><br />
<img width="97" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg41.png" alt="$ a_1,o_1,a_2,o_2$"/> ) is:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
J^{-1}(a_w,o_w) =<br />
\frac{v}{8}<br />
\begin{bmatrix}<br />
1 + 4/n &#038; -1 + 4/n \\<br />
-1 + 4/n &#038; 1 + 4/n<br />
\end{bmatrix}<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="338" height="66" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg80.png" alt="$\displaystyle J^{-1}(a_w,o_w) = \frac{v}{8} \begin{bmatrix} 1 + 4/n &amp; -1 + 4/n \ -1 + 4/n &amp; 1 + 4/n \end{bmatrix} .$"/></div>
<p>The theory says that the diagonal entries of this matrix are essentially lower bounds on the squared error in the estimates of the apple and orange weights, respectively. The off-diagonal terms describe how an error in the estimate of the mean apple weight affects the estimate of the mean orange weight, and vice-versa. So what we would like is for all the entries of <!-- MATH<br />
 $J^{-1}(a_w,o_w)$<br />
 --><br />
<img width="98" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg81.png" alt="$ J^{-1}(a_w,o_w)$"/> to approach zero as <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> increases. In our case, however, the entries of <!-- MATH<br />
 $J^{-1}(a_w,o_w)$<br />
 --><br />
<img width="98" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg81.png" alt="$ J^{-1}(a_w,o_w)$"/> all tend to the constant <!-- MATH<br />
 $\frac{v}{8}$<br />
 --><br />
<img width="15" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg82.png" alt="$ \frac{v}{8}$"/> as <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> grows, meaning that the errors in the estimates are also bounded away from zero and stop improving as the sample size increases.</p>
<p>The above discussion assumes that the distribution of apples and oranges in each basket is the same (in this case, random and uniform). If there is some constructive bias in the process forming <!-- MATH<br />
 $a_1,o_1,a_2,o_2$<br />
 --><br />
<img width="97" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg41.png" alt="$ a_1,o_1,a_2,o_2$"/> , such as apples being a bit more likely in the first basket and oranges a bit more likely in the second basket, then the demonstrated estimate is good (with error decreasing as <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> grows) and is actually useful. But the degree of utility of the estimate depends on how much useful bias we have- if there is not much useful bias then the errors shrink very slowly and we need a lot more data than one would first expect to get a good measurement. Finally, we would like to remind the reader that it is impossible for a channel design with a limited number of channels to simultaneously have an independent large useful bias on very many measurements.</p>
<p>As an example of the application of useful bias suppose that our coin has probability <img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg83.png" alt="$ p$"/> of coming up heads, and that the first basket is filled by placing an apple every time the coin is heads, and an orange every time the coin is tails. The second basket is filled the opposite way &#8211; apple for tails, orange for heads. Again, let&#8217;s treat the number of apples and oranges in each basket, <!-- MATH<br />
 $a_1,o_1,a_2,o_2$<br />
 --><br />
<img width="97" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg41.png" alt="$ a_1,o_1,a_2,o_2$"/> , as known constants set at &#8220;typical values&#8221; that we would expect from the coin flipping procedure.</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
a_1 &#038; = &#038; np \\<br />
o_1 &#038; = &#038; n(1-p) \\<br />
a_2 &#038; = &#038; n(1-p)  \\<br />
o_2 &#038; = &#038; np<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="22" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg42.png" alt="$\displaystyle a_1$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg84.png" alt="$\displaystyle np$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg45.png" alt="$\displaystyle o_1$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="72" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg85.png" alt="$\displaystyle n(1-p)$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="22" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg47.png" alt="$\displaystyle a_2$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="72" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg85.png" alt="$\displaystyle n(1-p)$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg48.png" alt="$\displaystyle o_2$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg84.png" alt="$\displaystyle np$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/><br />
(as long as <!-- MATH<br />
 $p \neq \frac{1}{2}$<br />
 --><br />
<img width="50" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg86.png" alt="$ p \neq \frac{1}{2}$"/> the <img width="31" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg87.png" alt="$ \sqrt{n}$"/> terms are dominated by the bias and can be ignored).</p>
<p>If <img width="48" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg88.png" alt="$ p=1$"/> &#8211; the coin always comes up heads &#8211; then the first basket is only apples, and the second basket is only oranges, and obviously, we can find good estimates of <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> and <img width="24" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg40.png" alt="$ o_w$"/> , by the arguments in Section <a href="#sec:themean">3.1.1</a>. If <!-- MATH<br />
 $p=\frac{1}{2}$<br />
 --><br />
<img width="50" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg89.png" alt="$ p=\frac{1}{2}$"/> , then we are in the situation that we already discussed, with approximately equal numbers of apples and oranges in each basket. But suppose <img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg83.png" alt="$ p$"/> were some other value besides <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg90.png" alt="$ 1$"/> or <!-- MATH<br />
 $\frac{1}{2}$<br />
 --><br />
<img width="15" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg91.png" alt="$ \frac{1}{2}$"/> , say, <!-- MATH<br />
 $p=\frac{1}{4}$<br />
 --><br />
<img width="50" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg92.png" alt="$ p=\frac{1}{4}$"/> . In that case, the first basket would be primarily oranges, and the second one primarily apples, and we can show that</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
J^{-1}(a_w,o_w) =<br />
\frac{v}{2n}<br />
\begin{bmatrix}<br />
5 &#038; -3 \\<br />
-3  &#038; 5<br />
\end{bmatrix}<br />
,<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="243" height="66" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg93.png" alt="$\displaystyle J^{-1}(a_w,o_w) = \frac{v}{2n} \begin{bmatrix} 5 &amp; -3 \ -3 &amp; 5 \end{bmatrix} ,$"/></div>
<p>and all of the entries of <!-- MATH<br />
 $J^{-1}(a_w,o_w)$<br />
 --><br />
<img width="98" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg81.png" alt="$ J^{-1}(a_w,o_w)$"/> do go to zero as <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> gets larger. This can be shown to be true in general, for any <!-- MATH<br />
 $p\ne\frac{1}{2}$<br />
 --><br />
<img width="50" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg94.png" alt="$ p\ne\frac{1}{2}$"/> . This means the Cramer-Rao bound does not prevent estimation. Another calculation (not shown here) confirms that our proposed estimate does indeed have shrinking error (as <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> increases).</p>
<h1><a name="SECTION00050000000000000000">Other Solution Methods</a></h1>
<p>We did not discuss solution methods that involve more data, such as repeated experiments, or significantly deeper knowledge, such as factor models. What we discussed were the limits of the basic modeling step, which itself would be a component of the more sophisticated solutions. Here however, we will briefly touch on other procedures that could be used to try to improve the situation discussed above.</p>
<p><em>Repeated measurements</em> could be implemented by taking data over many days, reassigning the channel identifiers so that each search term participates in different combinations of channel identifiers over the course of the measurements. Essentially, this is setting up a much larger system of simultaneous equations, from which a larger number of variables can be estimated. There are mathematical procedures for this sort of iterative estimation (such as the famous Kalman filter), but the number of quantities a web site would wish to estimate is so much larger than the number of measurements available that the procedure will require many reconciliation rounds to converge. In addition, this model assumes that the values of the variables being measured do not change over time (or change very slowly). This is not an assumption that is necessarily true in the AdWords domain, due to seasonality and other effects.</p>
<p>A <em>factor model</em> is a model where one has researched a small number of causes or factors that explain the expected value of search phrases in a very simple manner. For example it would be nice if the value of a search phrase were the sum of a value determined by the first letter plus an independent value determined by the second letter. In such a case we would only need <img width="94" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg95.png" alt="$ 2*26 = 52$"/> channels (to track the factors) and we would then be able to apply our model to many different search phrases. Factor models are a good solution, and are commonly used in other industries, such as finance, but one needs to invest in developing factors much better than the example factors we just mentioned.</p>
<h1><a name="SECTION00060000000000000000">Conclusion</a></h1>
<p>The last section brings us to the point of this writeup. Having data from a limited number of channels is a fundamental limit on information in the Google click-out market. You can not get around it by mere calculation. You need other information sources or aggregation schemes which may or may not be available.</p>
<p>The points we have touched on are:</p>
<ul>
<li>You can not estimate the variance of individuals from a constant number of aggregated measurements.
<p>This is bad because this interferes with detailed estimates of risk.</p>
</li>
<li>You can not always undo bad channel assignments by calculation after the fact.
<p>This is bad because this interferes with detailed assignments and management of value.</p>
</li>
</ul>
<p>In a market information is money. To the extent you buy or sell in ignorance you leak money to any counter-parties that know the things that you do not. Even if there are no such informed counter-parties there are distinct disadvantages in not being able to un-bundle mixed measurements. This means it is difficult to un-bundle mixed sales. For example we may be making a profit on a combination purchase of advertisements and we are not able to quickly determine which advertisements in the combination are profitable and which are unprofitable.<a name="tex2html9" href="#foot161"><sup>9</sup></a></p>
<p>The capital markets (stocks, bonds, index funds, <img width="30" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg96.png" alt="$ \cdots$"/> ) have evolved and progressed forward from initial disorganized arrangements to open outcry markets and then to detailed information environments. The demands and expectations of these modern markets include a number of features including:</p>
<ul>
<li>Complete reconciliation and publicly available detailed records of the past.</li>
<li>Transparent &#8220;books&#8221; or listings of all current bids and bidders.</li>
</ul>
<p>Not all of these are appropriate for a non-capital market and Google&#8217;s on-line advertising markets are just that: Google&#8217;s. It is interesting that before 2007 Yahoo/Overture offered a research interface that did expose the bidding book. It will be interesting to see how the on-line advertising markets evolve and if this feature survives in the newer &#8220;more like Google&#8221; Overture market.</p>
<p>The actual lesson we learned in watching others work with on-line advertising markets are the following. It is not necessary to be able to perform any of the calculations mentioned here to run a successful business. It is important, however, to have a statistician&#8217;s intuition as to what is risky, what can be estimated and what can not be estimated. The surprise to the first author that his initial intuition was wrong, even though he considers himself a mathematician. It wasn&#8217;t until we removed the non-essential details from the problem and found the appropriate statistical references that we was finally able to fully convince ourselves that these estimation problems are in fact difficult.<a name="tex2html10" href="#foot164"><sup>10</sup></a></p>
<h2><a name="SECTION00070000000000000000">Bibliography</a></h2>
<dl compact>
<dt><a name="cove_thom_91">1</a></dt>
<dd>C<small>OVER,</small> T.&nbsp;M., <small>AND</small> T<small>HOMAS,</small> J.&nbsp;A.<br />
<em>Elements of Information Theory</em>.<br />
John Wiley &amp; sons, 1991.</dd>
<dt><a name="goog1">2</a></dt>
<dd>G<small>OOGLE</small>.<br />
Google advertising programs.<br />
<tt><a name="tex2html11" href="http://www.google.com/ads/">http://www.google.com/ads/</a></tt>.</dd>
<dt><a name="Metropolis:1949:MCM">3</a></dt>
<dd>M<small>ETROPOLIS,</small> N., <small>AND</small> U<small>LAM,</small> S.<br />
The Monte Carlo method.<br />
335-341.</dd>
<dt><a name="rota:1997a">4</a></dt>
<dd>R<small>OTA,</small> G.<br />
<em>Indiscrete Thoughts</em>.<br />
Birkh&auml;user, Boston, 1997.</dd>
<dt><a name="googval">5</a></dt>
<dd>Y<small>AHOO!</small><br />
Google key statistics.<br />
<tt><a name="tex2html12" href="http://finance.yahoo.com/q/ks?s=GOOG">http://finance.yahoo.com/q/ks?s=GOOG</a></tt>.</dd>
</dl>
<h1><a name="SECTION00080000000000000000">Appendix</a></h1>
<h2><a name="SECTION00081000000000000000">Derivation That a Single Mean is Easy to Estimate</a></h2>
<p>To show <!-- MATH<br />
 $E[(\hat{a}_w - a_w)^2] =  v_a/n$<br />
 --><br />
<img width="179" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg21.png" alt="$ E[(\hat{a}_w - a_w)^2] = v_a/n$"/> we introduce the symbols <img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg97.png" alt="$ x_i$"/> to denote the random variables representing the <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> apples in our basket and work forward.</p>
<p>To calculate we will need to use some of the theory of the expectation notation <img width="30" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg10.png" alt="$ E[]$"/> . Simple facts about the <img width="30" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg10.png" alt="$ E[]$"/> notation are used to reduce complicated expressions into known quantities. For example if <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg9.png" alt="$ x$"/> is a random variable and <img width="12" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg98.png" alt="$ c$"/> is a constant than <!-- MATH<br />
 $E[c x] = c E[x]$<br />
 --><br />
<img width="118" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg99.png" alt="$ E[c x] = c E[x]$"/> . If <img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg100.png" alt="$ y$"/> is a random variable that is independent of <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg9.png" alt="$ x$"/> then <!-- MATH<br />
 $E[x y] = E[x] E[y]$<br />
 --><br />
<img width="146" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg101.png" alt="$ E[x y] = E[x] E[y]$"/> . And we have for any quantities <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg9.png" alt="$ x$"/> ,<img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg100.png" alt="$ y$"/> <!-- MATH<br />
 $E[x + y] = E[x] + E[y]$<br />
 --><br />
<img width="192" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg102.png" alt="$ E[x + y] = E[x] + E[y]$"/> (even when they are not independent).<a name="tex2html13" href="#foot169"><sup>11</sup></a></p>
<p>Starting our calculation:</p>
<p></p>
<div align="center"><a name="eq:1"></a><a name="eq:2"></a><a name="eq:3"></a><a name="eq:4"></a><a name="eq:5"></a><a name="eq:6"></a><a name="eq:7"></a><a name="eq:8"></a><!-- MATH<br />
 \begin{eqnarray}<br />
E[(\hat{a}_w - a_w)^2] &#038; = &#038;<br />
E\left[\left( (\sum_{i=1}^n x_i)/n - a_w \right)^2 \right]\\<br />
 &#038; = &#038; E\left[\left( \sum_{i=1}^n (x_i - a_w)/n \right)^2 \right]\\<br />
 &#038; = &#038; E\left[ \sum_{i=1}^n (x_i - a_w) \sum_{j=1}^n (x_j - a_w) \right]/n^2\\<br />
 &#038; = &#038; E\left[\sum_{i,j} (x_i - a_w) (x_j - a_w)\right]/n^2\\<br />
 &#038; = &#038; E\left[ \sum_{i=1}^n (x_i - a_w)^2 \right]/n^2\\<br />
 &#038; = &#038; E\left[ n (x - a_w)^2 \right]/n^2\\<br />
 &#038; = &#038; E\left[ (x - a_w)^2 \right]/n\\<br />
 &#038; = &#038; v_a/n.<br />
\end{eqnarray}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="116" height="41" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg105.png" alt="$\displaystyle E[(\hat{a}_w - a_w)^2]$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="212" height="87" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg106.png" alt="$\displaystyle E\left[\left( (\sum_{i=1}^n x_i)/n - a_w \right)^2 \right]$"/></td>
<td width="10" align="right">(1)</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="208" height="87" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg107.png" alt="$\displaystyle E\left[\left( \sum_{i=1}^n (x_i - a_w)/n \right)^2 \right]$"/></td>
<td width="10" align="right">(2)</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="285" height="77" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg108.png" alt="$\displaystyle E\left[ \sum_{i=1}^n (x_i - a_w) \sum_{j=1}^n (x_j - a_w) \right]/n^2$"/></td>
<td width="10" align="right">(3)</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="254" height="77" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg109.png" alt="$\displaystyle E\left[\sum_{i,j} (x_i - a_w) (x_j - a_w)\right]/n^2$"/></td>
<td width="10" align="right">(4)</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="186" height="77" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg110.png" alt="$\displaystyle E\left[ \sum_{i=1}^n (x_i - a_w)^2 \right]/n^2$"/></td>
<td width="10" align="right">(5)</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="158" height="41" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg111.png" alt="$\displaystyle E\left[ n (x - a_w)^2 \right]/n^2$"/></td>
<td width="10" align="right">(6)</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="139" height="41" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg112.png" alt="$\displaystyle E\left[ (x - a_w)^2 \right]/n$"/></td>
<td width="10" align="right">(7)</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="47" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg113.png" alt="$\displaystyle v_a/n.$"/></td>
<td width="10" align="right">(8)</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>Most of the lines of the derivation are just substitutions or uses of definition (for example the last substitution on line <a href="#eq:8">8</a> is of <!-- MATH<br />
 $E[ (x - a_w)^2 ] \rightarrow v_a$<br />
 --><br />
<img width="153" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg114.png" alt="$ E[ (x - a_w)^2 ] \rightarrow v_a$"/> ). A few of the lines use some cute facts about statistics. For example line <a href="#eq:4">4</a> <!-- MATH<br />
 $\rightarrow$<br />
 --><br />
<img width="23" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg115.png" alt="$ \rightarrow$"/> line <a href="#eq:5">5</a> is using the fact that <!-- MATH<br />
 $E[x_i - a_w] = 0$<br />
 --><br />
<img width="124" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg116.png" alt="$ E[x_i - a_w] = 0$"/> , which under our independent drawing assumption is enough to show <!-- MATH<br />
 $E[(x_i - a_w)(x_j - a_w)] = 0$<br />
 --><br />
<img width="215" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg117.png" alt="$ E[(x_i - a_w)(x_j - a_w)] = 0$"/> when <img width="45" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg118.png" alt="$ i \neq j$"/> (hence all these terms can be ignored). The line <a href="#eq:5">5</a> <!-- MATH<br />
 $\rightarrow$<br />
 --><br />
<img width="23" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg115.png" alt="$ \rightarrow$"/> line <a href="#eq:6">6</a> substitution uses the fact that each of the <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> apples was drawn using an identical process, so we expect the same amount of error in each trial (and there are <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> trials in total).</p>
<p>The conclusion of the derivation is that the expected squared error <!-- MATH<br />
 $E[(\hat{a}_w - a_w)^2]$<br />
 --><br />
<img width="116" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg20.png" alt="$ E[(\hat{a}_w - a_w)^2]$"/> is a factor of <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> smaller than <!-- MATH<br />
 $v_a = E[(x - a_w)^2]$<br />
 --><br />
<img width="149" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg19.png" alt="$ v_a = E[(x - a_w)^2]$"/> . This means our estimate <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg5.png" alt="$ \hat{a}_w$"/> is getting better and better (closer to the true <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> ) as we increase the sample size <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> .</p>
<h2><a name="SECTION00082000000000000000">Fisher Information and the Cramer-Rao Inequality</a></h2>
<h3><a name="SECTION00082100000000000000">Discussion</a></h3>
<p>What is Fisher information? Is it like the other mathematical quantities that go by the name of information?</p>
<p>There are a lot of odd quantities related to information each with its own deep theoretical framework. For example there are Clausius entropy, Shannon information and Kolmogorov-Chaiten complexity. Each of these has useful applications, precise mathematics and deep meaning. They also have somewhat confused and incorrect pseudo-philosophical popularizations.</p>
<p>Fisher information is not really famous outside of statistics. Textbooks motivate it in different ways and often introduce an auxiliary function called &#8220;score&#8221; that quickly makes the calculations work out. The definition of&#8220;score&#8221; uses the fact that <!-- MATH<br />
 $\frac{\partial}{\partial \theta} \ln (f(\theta)) =<br />
\left( \frac{\partial}{\partial \theta} f(\theta) \right) / f(\theta)$<br />
 --><br />
<img width="235" height="43" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg119.png" alt="$ \frac{\partial}{\partial \theta} \ln (f(\theta)) = \left( \frac{\partial}{\partial \theta} f(\theta) \right) / f(\theta)$"/> to switch from likelihoods to relative likelihoods. The entries of the Fisher information matrix are terms of the form</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
J_{i,j}(\theta) = \int_x f(x;\theta)<br />
\left( \frac{\partial}{\partial \theta_i} \ln f(x;\theta) \right)<br />
\left( \frac{\partial}{\partial \theta_j} \ln f(x;\theta)  \right)<br />
dx<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="453" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg120.png" alt="$\displaystyle J_{i,j}(\theta) = \int_x f(x;\theta) \left( \frac{\partial}{\pa... ...right) \left( \frac{\partial}{\partial \theta_j} \ln f(x;\theta) \right) dx $"/></div>
<p>where <img width="14" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg121.png" alt="$ \theta$"/> is our vector of parameters (set at their unknown true values that we are trying to estimate) , <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg9.png" alt="$ x$"/> ranges over all possible measurements and <!-- MATH<br />
 $f(x;\theta)$<br />
 --><br />
<img width="58" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg122.png" alt="$ f(x;\theta)$"/> reads off the likelihood of observing the measurement <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg9.png" alt="$ x$"/> given the parameter <img width="14" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg121.png" alt="$ \theta$"/> .</p>
<p>Fisher information is actually a simpler concept than the other forms of information. The entries in the Fisher information matrix are merely the expected values of the effect of each pair of parameters on the relative likelihood of different observations. In this case, it is showing how alterations in the unknown parameters would change the relative likelihood of different observed outcomes. It is then fairly clever (but not too surprising) that its inverse can then read off how changes in observed outcome influence estimates of the unknown parameters. The Cramer-Rao inequality is using Fisher information to describe properties of an inverse (recovering parameters from observed data) without needing to know the specific inversion process (how we performed the estimate).</p>
<h3><a name="SECTION00082200000000000000">Calculating Cramer-Rao on the Variance of Variance Estimate</a></h3>
<p>When attempting to measure the variance of individual apples (Experiment-B) our data was two sums of random variables (each <img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg97.png" alt="$ x_i$"/> or <img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg123.png" alt="$ y_i$"/> representing a single apple):</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
T_1 &#038; = &#038; \sum_{i=1}^{n_1} x_i \\<br />
T_2 &#038; = &#038; \sum_{i=1}^{n_2} y_i<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg63.png" alt="$\displaystyle T_1$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="51" height="73" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg124.png" alt="$\displaystyle \sum_{i=1}^{n_1} x_i$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg65.png" alt="$\displaystyle T_2$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="50" height="73" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg125.png" alt="$\displaystyle \sum_{i=1}^{n_2} y_i$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p><img width="50" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg126.png" alt="$ n_1,n_2$"/> can be any positive integers.</p>
<p>Under our assumption that the weight of apples is normally distributed with mean-weight <img width="25" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg4.png" alt="$ a_w$"/> and variance <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> we can write down the odds-density for any pair of measurements <img width="50" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg62.png" alt="$ T_1,T_2$"/> as:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
f(T_1,T_2;v_a) =<br />
\frac{1}{2 \pi v_a \sqrt{n_1 n_2}} e^{-(T_1 - n_1 a_w)^2/(2 n_1 v_a) - (T_2 - n_2 a_w)^2/(2 n_2 v_a)}<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="510" height="59" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg127.png" alt="$\displaystyle f(T_1,T_2;v_a) = \frac{1}{2 \pi v_a \sqrt{n_1 n_2}} e^{-(T_1 - n_1 a_w)^2/(2 n_1 v_a) - (T_2 - n_2 a_w)^2/(2 n_2 v_a)} .$"/></div>
<p>To apply the Cramer-Rao inequality we need the Fischer information of this distribution which is defined as:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
J(v_a) = \int_{T_1,T_2} f(T_1,T_2;v_a)<br />
\left( \frac{\partial}{\partial v_a} \ln f(T_1,T_2;v_a) \right)^2<br />
dT_1 dT_2<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="471" height="72" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg128.png" alt="$\displaystyle J(v_a) = \int_{T_1,T_2} f(T_1,T_2;v_a) \left( \frac{\partial}{\partial v_a} \ln f(T_1,T_2;v_a) \right)^2 dT_1 dT_2 .$"/></div>
<p>The first step is to use the fact that</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\frac{\partial}{\partial x} \ln e^{-f(x)^2} = -2 \frac{\partial}{\partial x} f(x)<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="215" height="61" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg129.png" alt="$\displaystyle \frac{\partial}{\partial x} \ln e^{-f(x)^2} = -2 \frac{\partial}{\partial x} f(x) $"/></div>
<p>and write</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
J(v_a) &#038; = &#038;<br />
\int_{T_1,T_2} f(T_1,T_2;v_a)<br />
\left( (T_1 - n_1 a_w)^2/(2 n_1 v_a^2) + (T_2 - n_2 a_w)^2)/(2 n_2 v_a^2) \right)^2<br />
dT_1 dT_2 \\<br />
&#038; = &#038;<br />
\frac{1}{4 v_a^4} \int_{T_1,T_2} f(T_1,T_2;v_a)<br />
(T_1 - n_1 a_w)^4/n_1^2<br />
dT_1 dT_2 \\<br />
&#038; &#038; + \frac{1}{4 v_a^4} \int_{T_1,T_2} f(T_1,T_2;v_a)<br />
2 (T_1 - n_1 a_w)^2 (T_2 - n_2 a_w)^2/(n_1 n_2)<br />
dT_1 dT_2 \\<br />
&#038; &#038; + \frac{1}{4 v_a^4} \int_{T_1,T_2} f(T_1,T_2;v_a)<br />
(T_2 - n_2 a_w)^4/n_2^2<br />
dT_1 dT_2 \\<br />
&#038; = &#038;<br />
\frac{1}{4 v_a^4} \int \Phi_{\sqrt{n_1 v_a}}(x;n_1 a_w) (x- n_1 a_w)^4 dx \\<br />
&#038; &#038;<br />
+ \frac{2}{4 v_a^4}<br />
\left(\int \Phi_{\sqrt{n_1 v_a}}(x;n_1 a_w) (x- n_1 a_w)^2 dx \right)<br />
\left(\int \Phi_{\sqrt{n_2 v_a}}(x;n_2 a_w) (x- n_2 a_w)^2 dx \right)\\<br />
&#038; &#038; + \frac{1}{4 v_a^4} \int \Phi_{\sqrt{n_2 v_a}}(x;n_2 a_w) (x- n_2 a_w)^4 dx<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="48" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg130.png" alt="$\displaystyle J(v_a)$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="609" height="61" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg131.png" alt="$\displaystyle \int_{T_1,T_2} f(T_1,T_2;v_a) \left( (T_1 - n_1 a_w)^2/(2 n_1 v_a^2) + (T_2 - n_2 a_w)^2)/(2 n_2 v_a^2) \right)^2 dT_1 dT_2$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="370" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg132.png" alt="$\displaystyle \frac{1}{4 v_a^4} \int_{T_1,T_2} f(T_1,T_2;v_a) (T_1 - n_1 a_w)^4/n_1^2 dT_1 dT_2$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td>&nbsp;</td>
<td align="left" nowrap><img width="530" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg133.png" alt="$\displaystyle + \frac{1}{4 v_a^4} \int_{T_1,T_2} f(T_1,T_2;v_a) 2 (T_1 - n_1 a_w)^2 (T_2 - n_2 a_w)^2/(n_1 n_2) dT_1 dT_2$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td>&nbsp;</td>
<td align="left" nowrap><img width="384" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg134.png" alt="$\displaystyle + \frac{1}{4 v_a^4} \int_{T_1,T_2} f(T_1,T_2;v_a) (T_2 - n_2 a_w)^4/n_2^2 dT_1 dT_2$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="303" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg135.png" alt="$\displaystyle \frac{1}{4 v_a^4} \int \Phi_{\sqrt{n_1 v_a}}(x;n_1 a_w) (x- n_1 a_w)^4 dx$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td>&nbsp;</td>
<td align="left" nowrap><img width="637" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg136.png" alt="$\displaystyle + \frac{2}{4 v_a^4} \left(\int \Phi_{\sqrt{n_1 v_a}}(x;n_1 a_w) (... ...x \right) \left(\int \Phi_{\sqrt{n_2 v_a}}(x;n_2 a_w) (x- n_2 a_w)^2 dx \right)$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td>&nbsp;</td>
<td align="left" nowrap><img width="318" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg137.png" alt="$\displaystyle + \frac{1}{4 v_a^4} \int \Phi_{\sqrt{n_2 v_a}}(x;n_2 a_w) (x- n_2 a_w)^4 dx$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/><br />
where <img width="32" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg138.png" alt="$ \Phi()$"/> is the standard single variable normal density:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\Phi_\sigma(x;\mu) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-(x-\mu)^2/(2 \sigma^2)}<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="256" height="59" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg139.png" alt="$\displaystyle \Phi_\sigma(x;\mu) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-(x-\mu)^2/(2 \sigma^2)} .$"/></div>
<p>The first term is the 4th moment of the normal and it is known that:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\int \Phi_{\sigma}(x;\mu) (x- \mu)^4 dx =<br />
3 \left( \int \Phi_{\sigma}(x;\mu) (x- \mu)^2 dx \right)^2<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="441" height="72" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg140.png" alt="$\displaystyle \int \Phi_{\sigma}(x;\mu) (x- \mu)^4 dx = 3 \left( \int \Phi_{\sigma}(x;\mu) (x- \mu)^2 dx \right)^2 .$"/></div>
<p>It is also a standard fact about the normal density that</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\int \Phi_{\sigma}(x;\mu) (x- \mu)^2 dx = \sigma^2<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="231" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg141.png" alt="$\displaystyle \int \Phi_{\sigma}(x;\mu) (x- \mu)^2 dx = \sigma^2 .$"/></div>
<p>So we have</p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
J(v_a) &#038; = &#038; \frac{1}{4 v_a^4} \left(<br />
\frac{3 n_1^2 v_a^2}{n_1^2}<br />
+ 2 \left( \frac{n_1 v_a}{n_1} \right) \left( \frac{n_2 v_a}{n_2} \right)<br />
+ \frac{3 n_2^2 v_a^2}{n_2^2}<br />
\right) \\<br />
&#038; = &#038; \frac{2}{v_a^2}<br />
.<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="48" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg130.png" alt="$\displaystyle J(v_a)$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="362" height="65" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg142.png" alt="$\displaystyle \frac{1}{4 v_a^4} \left( \frac{3 n_1^2 v_a^2}{n_1^2} + 2 \left( \... ...right) \left( \frac{n_2 v_a}{n_2} \right) + \frac{3 n_2^2 v_a^2}{n_2^2} \right)$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="31" height="59" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg143.png" alt="$\displaystyle \frac{2}{v_a^2} .$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>Finally we have the Fisher Information <!-- MATH<br />
 $J(v_a) = \frac{2}{v_a^2}.$<br />
 --><br />
<img width="96" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg144.png" alt="$ J(v_a) = \frac{2}{v_a^2}.$"/> We can then apply the Cramer-Rao inequality which says that <!-- MATH<br />
 $E[(v_a - \hat{v})^2] \ge 1/J(v_a)$<br />
 --><br />
<img width="190" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg145.png" alt="$ E[(v_a - \hat{v})^2] \ge 1/J(v_a)$"/> for <em>any</em> unbiased estimator (no matter how we choose <img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg146.png" alt="$ n_1$"/> and <img width="23" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg147.png" alt="$ n_2$"/> ) of <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> (unbiased meaning <!-- MATH<br />
 $E[v_a - \hat{v}] = 0$<br />
 --><br />
<img width="114" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg148.png" alt="$ E[v_a - \hat{v}] = 0$"/> ). The theory is telling us that the unknown parameter <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> has such a sloppy contribution to the likelihood of our observations that it is in fact difficult to pin down the value from any one set of observations. In our case we have just shown that <!-- MATH<br />
 $E[(v_a - \hat{v})^2] \ge \frac{v_a^2}{2}$<br />
 --><br />
<img width="145" height="48" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg149.png" alt="$ E[(v_a - \hat{v})^2] \ge \frac{v_a^2}{2}$"/> , which means no estimation procedure that uses just a single instance of the total <img width="50" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg62.png" alt="$ T_1,T_2$"/> can reliably estimate the variance <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg18.png" alt="$ v_a$"/> of individual apple weights.</p>
<h3><a name="SECTION00082300000000000000">Calculating Cramer-Rao Inequality on Multiple Mean Estimates</a></h3>
<p>In Experiment-C we again have two baskets of fruit- but they contain apples and oranges in the proportions given by <!-- MATH<br />
 $a_1,o_1,a_2,o_2$<br />
 --><br />
<img width="97" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg41.png" alt="$ a_1,o_1,a_2,o_2$"/> . Our assumption that the individual fruit weights are normally distributed with means <img width="53" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg150.png" alt="$ a_w,o_w$"/> and common variance <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg151.png" alt="$ v$"/> lets us us write the joint probability of the total measurements <img width="50" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg62.png" alt="$ T_1,T_2$"/> in terms of the normal-density (<img width="32" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg138.png" alt="$ \Phi()$"/> ).</p>
<p>For our problem where the variables are the sums <img width="50" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg62.png" alt="$ T_1,T_2$"/> and we have two parameters (the two unknown means <img width="53" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg150.png" alt="$ a_w,o_w$"/> ) and a single per-fruit variance <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg151.png" alt="$ v$"/> we will use the two dimensional normal density:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\Phi_{\sqrt{n v}}(T_1,T_2;a_w,o_w) = \frac{1}{2 \pi n v}<br />
e^{(-(T_1 - a_1 a_w - o_1 o_w)^2 -(T_2 - a_2 a_w - o_2 o_w)^2)/(2 n v)}<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="544" height="59" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg152.png" alt="$\displaystyle \Phi_{\sqrt{n v}}(T_1,T_2;a_w,o_w) = \frac{1}{2 \pi n v} e^{(-(T_1 - a_1 a_w - o_1 o_w)^2 -(T_2 - a_2 a_w - o_2 o_w)^2)/(2 n v)} . $"/></div>
<p>We concentrate on the variables <img width="50" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg62.png" alt="$ T_1,T_2$"/> and will abbreviate this density (leaving implicit the important parameters <img width="71" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg153.png" alt="$ a_w,o_w,v$"/> ) as <!-- MATH<br />
 $\Phi(T_1,T_2)$<br />
 --><br />
<img width="78" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg154.png" alt="$ \Phi(T_1,T_2)$"/> .</p>
<p>From this we can read off the difficulty in estimating individual apple weight:</p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
J_{1,1}(a_w,o_w) &#038; = &#038;<br />
\int_{T_1,T_2} \Phi(T_1,T_2)<br />
\left( \frac{\partial}{\partial a_w} \ln \Phi(T_1,T_2) \right)<br />
\left( \frac{\partial}{\partial a_w} \ln \Phi(T_1,T_2) \right)<br />
dT_1 dT_2 \\<br />
&#038; = &#038;<br />
\int_{T_1,T_2} \Phi(T_1,T_2)<br />
\frac{<br />
(2 a_1 (T_1 - a_1 a_w - o_1 o_w) + 2 a_2 (T_2 - a_2 a_w - o_2 o_w))^2<br />
}{<br />
4 n^2 v^2<br />
}<br />
dT_1 dT_2 \\<br />
&#038; = &#038;<br />
\frac{a_1^2 + a_2^2}{n v}<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="96" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg155.png" alt="$\displaystyle J_{1,1}(a_w,o_w)$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="509" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg156.png" alt="$\displaystyle \int_{T_1,T_2} \Phi(T_1,T_2) \left( \frac{\partial}{\partial a_w}... ...right) \left( \frac{\partial}{\partial a_w} \ln \Phi(T_1,T_2) \right) dT_1 dT_2$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="590" height="65" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg157.png" alt="$\displaystyle \int_{T_1,T_2} \Phi(T_1,T_2) \frac{ (2 a_1 (T_1 - a_1 a_w - o_1 o_w) + 2 a_2 (T_2 - a_2 a_w - o_2 o_w))^2 }{ 4 n^2 v^2 } dT_1 dT_2$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right">&nbsp;</td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="66" height="65" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg158.png" alt="$\displaystyle \frac{a_1^2 + a_2^2}{n v}$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>The first step is using the fact that</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\frac{\partial}{\partial x} \ln e^{-f(x)^2} = -2 \frac{\partial}{\partial x} f(x)<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="215" height="61" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg129.png" alt="$\displaystyle \frac{\partial}{\partial x} \ln e^{-f(x)^2} = -2 \frac{\partial}{\partial x} f(x) $"/></div>
<p>The last step is using a number fundamental facts about the normal density:</p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
\int_{x} \Phi_\sigma(x;\mu) dx &#038; = &#038; 1 \\<br />
\int_{x} \Phi_\sigma(x;\mu) (x -\mu) dx &#038; = &#038; 0 \\<br />
\int_{x} \Phi_\sigma(x;\mu) (x -\mu)^2 dx &#038; = &#038; \sigma^2<br />
.<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="114" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg159.png" alt="$\displaystyle \int_{x} \Phi_\sigma(x;\mu) dx$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg160.png" alt="$\displaystyle 1$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="174" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg161.png" alt="$\displaystyle \int_{x} \Phi_\sigma(x;\mu) (x -\mu) dx$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="182" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg162.png" alt="$\displaystyle \int_{x} \Phi_\sigma(x;\mu) (x -\mu)^2 dx$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg43.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="28" height="41" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg163.png" alt="$\displaystyle \sigma^2 .$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/><br />
These facts allow us say that the so-called &#8220;cross terms&#8221; (like <!-- MATH<br />
 $(T_1 - a_1 a_w - o_1 o_w) (T_2 - a_2 a_w - o_2 o_w)$<br />
 --><br />
<img width="313" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg164.png" alt="$ (T_1 - a_1 a_w - o_1 o_w) (T_2 - a_2 a_w - o_2 o_w)$"/> ) integrate to zero and the square terms read off the variance. One of the reasons to assume a common distribution (such as the normal) is that almost any complicated calculation involving such distributions (differentiating, integrating) can usually be reduced to looking up a few well know facts about the so-called &#8220;moments&#8221; of the distribution, as we have done here. Of, course picking a distribution that accurately models reality take precedent over picking one that eases calculation.</p>
<p>The other entries of the Fisher Information matrix can be read off as easily and we derive:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
J(a_w,o_w) =<br />
\frac{1}{n v}<br />
\begin{bmatrix}<br />
a_1^2 + a_2^2 &#038; a_1 o_1 + a_2 o_2 \\<br />
a_1 o_1 + a_2 o_2 &#038; o_1^2 + o_2^2 \\<br />
\end{bmatrix}<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="359" height="66" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg78.png" alt="$\displaystyle J(a_w,o_w) = \frac{1}{n v} \begin{bmatrix} a_1^2 + a_2^2 &amp; a_1 o_1 + a_2 o_2 \ a_1 o_1 + a_2 o_2 &amp; o_1^2 + o_2^2 \ \end{bmatrix} .$"/></div>
<p>Substituting our &#8220;typical&#8221; values of <!-- MATH<br />
 $a_1,o_1,a_2,o_2$<br />
 --><br />
<img width="97" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg41.png" alt="$ a_1,o_1,a_2,o_2$"/> from Section <a href="#sec:mixture">3.2</a> we have</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
J(a_w,o_w) =<br />
\frac{1}{2 v}<br />
\begin{bmatrix}<br />
n  + 4 &#038; n - 4 \\<br />
n - 4 &#038; n + 4<br />
\end{bmatrix}<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="263" height="66" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg165.png" alt="$\displaystyle J(a_w,o_w) = \frac{1}{2 v} \begin{bmatrix} n + 4 &amp; n - 4 \ n - 4 &amp; n + 4 \end{bmatrix} .$"/></div>
<p>At first things look good. The <!-- MATH<br />
 $J(a_w,o_w)$<br />
 --><br />
<img width="80" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg79.png" alt="$ J(a_w,o_w)$"/> entries are growing with <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> so we might expect the entries of <!-- MATH<br />
 $J^{-1}(a_w,o_w)$<br />
 --><br />
<img width="98" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg81.png" alt="$ J^{-1}(a_w,o_w)$"/> to shrink as <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> increases. However, the entries they are all nearly identical so the matrix is ill-conditioned and we see larger than expected entries in the inverse. In fact in this case we have:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
J^{-1}(a_w,o_w) =<br />
\frac{v}{8}<br />
\begin{bmatrix}<br />
1 + 4/n &#038; -1 + 4/n \\<br />
-1 + 4/n &#038; 1 + 4/n<br />
\end{bmatrix}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="330" height="66" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg166.png" alt="$\displaystyle J^{-1}(a_w,o_w) = \frac{v}{8} \begin{bmatrix} 1 + 4/n &amp; -1 + 4/n \ -1 + 4/n &amp; 1 + 4/n \end{bmatrix} $"/></div>
<p>and these entries are not tending to zero- establishing (by the Cramer-Rao inequality) the difficulty of estimation.</p>
<h3><a name="SECTION00082400000000000000">Cramer-Rao Inequality Holds in General</a></h3>
<p>By inspecting our last series of arguments, we can actually say a bit more. The difficulty in estimation was not due to our specific assumed values of <!-- MATH<br />
 $a_1,o_1,a_2,o_2$<br />
 --><br />
<img width="97" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg41.png" alt="$ a_1,o_1,a_2,o_2$"/> , but rather to the fact that the coin-flipping process we described earlier will nearly always land us in about as bad a situation for large <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> . We can see that the larger the differences <!-- MATH<br />
 $|a_1 - o_1|$<br />
 --><br />
<img width="72" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg167.png" alt="$ \vert a_1 - o_1\vert$"/> and <!-- MATH<br />
 $|a_2 - o_2|$<br />
 --><br />
<img width="72" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg168.png" alt="$ \vert a_2 - o_2\vert$"/> the better things are for estimation. The &#8220;strong law of large numbers&#8221; states that as <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> increases we expect (with probability 1) to have <!-- MATH<br />
 $|a_1 - o_1| \rightarrow \sqrt{2 v n}$<br />
 --><br />
<img width="148" height="45" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg169.png" alt="$ \vert a_1 - o_1\vert \rightarrow \sqrt{2 v n}$"/> and <!-- MATH<br />
 $|a_2 - o_2| \rightarrow \sqrt{2 v n}$<br />
 --><br />
<img width="148" height="45" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg170.png" alt="$ \vert a_2 - o_2\vert \rightarrow \sqrt{2 v n}$"/> . This means that it would be very rare (for large <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> ) to see differences in <!-- MATH<br />
 $a_1,o_2,a_2,o_2$<br />
 --><br />
<img width="97" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg171.png" alt="$ a_1,o_2,a_2,o_2$"/> larger than we saw in our &#8220;typical case.&#8221; This lets us conclude that if there is no constructive bias then for large <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> estimation is almost always as difficult as the example we worked out.</p>
<p>Now if there were any constructive bias in the experiment (such as apples were a bit more likely in the first basket and oranges were a bit more likely in the second basket) then the entries of <img width="49" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg172.png" alt="$ J^{-1}()$"/> would be forced to zero and the explicit estimate we gave earlier would in fact have shrinking error as <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> grew large. However only the fraction of the data we can attribute to the bias is really helping us (so if it was say a <img width="42" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg173.png" alt="$ 1/10$"/> th bias only about <img width="42" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg173.png" alt="$ 1/10$"/> th of the data is useful to us) and we would need a lot of data to experience lowered error (but at least the error would be falling). The point is that the evenly distributed portion of the data is essentially not useful for inference, and that is why it is so important to be inferring things that the experiment was designed to measure (and why the limit on channel identifiers is bad since it limits the number of things we can simultaneously design for).</p>
<p></p>
<hr />
<h4>Footnotes</h4>
<dl>
<dt><a name="foot8">&#8230; Mount</a><a href="#tex2html1"><sup>1</sup></a></dt>
<dd>http://www.mzlabs.com/</dd>
<dt><a name="foot9">&#8230; Zumel</a><a href="#tex2html2"><sup>2</sup></a></dt>
<dd>http://www.quimba.com/</dd>
<dt><a name="foot323">&#8230; apples)</a><a href="#tex2html3"><sup>3</sup></a></dt>
<dd>&#8220;Unbiased&#8221; simply means that <!-- MATH<br />
 $E[\hat{v}_a - v_a] = 0$<br />
 --><br />
<img width="122" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg27.png" alt="$ E[\hat{v}_a - v_a] = 0$"/> which can also be written as <!-- MATH<br />
 $E[\hat{v}_a] = v_a$<br />
 --><br />
<img width="89" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg28.png" alt="$ E[\hat{v}_a] = v_a$"/> . This means our estimate of variance doesn&#8217;t tend to be more over than under (or more under than over).</dd>
<dt><a name="foot324">&#8230; estimates</a><a href="#tex2html4"><sup>4</sup></a></dt>
<dd>As an aside, some of the value in proposing a specific estimate (because the theory says there is no good one) is that it allows one to investigate the failure of the estimate without resorting to the larger theory. For example in this day of friendly computer languages and ubiquitous computers one can easily empirically confirm (by setting up a simulation experiment as suggested by Metropolis and Ulam[<a href="#Metropolis:1949:MCM">3</a>]). One can check that our estimate is unbiased (by averaging many applications of it) and that it is not good (by observing the substantial error on each individual application even when <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg1.png" alt="$ n$"/> is enormous). There is no rule that one should not get an empirical feel (or even an empirical confirmation) of a mathematical statement (presentation of math is subject to errors) and in this day there are likely many more readers who could quickly confirm or disprove the claims of this section by simulation than there are readers who would be inclined to check many lines of tedious algebra for a subtle error.</dd>
<dt><a name="foot61">&#8230;.</a><a href="#tex2html5"><sup>5</sup></a></dt>
<dd>&#8220;Normal&#8221; is a statistical term for the distribution associated with the Bell curve. Many quantities in nature have a nearly normal distribution.</dd>
<dt><a name="foot73">&#8230; allowed</a><a href="#tex2html6"><sup>6</sup></a></dt>
<dd>And perhaps surprisingly not a function of the sample size.</dd>
<dt><a name="foot75">&#8230; phrase.</a><a href="#tex2html7"><sup>7</sup></a></dt>
<dd>These examples are deliberately trivial.</dd>
<dt><a name="foot77">&#8230; it.</a><a href="#tex2html8"><sup>8</sup></a></dt>
<dd>This is one of the nasty differences between prospective studies where the experimental layout is tailored to expose the quantities of interest and retrospective studies where we hope to infer new quantities from experiments that have relevant (but not specifically organized) data.</dd>
<dt><a name="foot161">&#8230; unprofitable.</a><a href="#tex2html9"><sup>9</sup></a></dt>
<dd>By &#8220;quickly determine&#8221; we mean determine from past data we already have. What we have shown is we often can not determine what we need to know from past data, but must return to the market with new experiments that cost both time and money.</dd>
<dt><a name="foot164">&#8230; difficult.</a><a href="#tex2html10"><sup>10</sup></a></dt>
<dd>This initial optimism of ours is perhaps a side-effect of a &#8220;can do&#8221; attitude.</dd>
<dt><a name="foot169">&#8230; independent).</a><a href="#tex2html13"><sup>11</sup></a></dt>
<dd>It is funny in statistics that we spend so much time reminding ourselves that <img width="50" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg103.png" alt="$ E[x y]$"/> is not always equal to <img width="76" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg104.png" alt="$ E[x] E[y]$"/> that we actually sometimes find it surprising that <!-- MATH<br />
 $E[x + y] = E[x] + E[y]$<br />
 --><br />
<img width="192" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/estimg102.png" alt="$ E[x + y] = E[x] + E[y]$"/> is generally true.</dd>
</dl>
<p>Related posts:<ol>
<li><a href='http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/' rel='bookmark' title='Paper on stock trading'>Paper on stock trading</a></li>
<li><a href='http://www.win-vector.com/blog/2008/05/betting-best-of-series/' rel='bookmark' title='Betting Best-Of Series'>Betting Best-Of Series</a></li>
<li><a href='http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/' rel='bookmark' title='A Demonstration of Data Mining'>A Demonstration of Data Mining</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2007/06/new-paper/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

