<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Win-Vector Blog &#187; Dynamic Programming</title>
	<atom:link href="http://www.win-vector.com/blog/tag/dynamic-programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.win-vector.com/blog</link>
	<description>The Applied Theorist&#039;s Point of View</description>
	<lastBuildDate>Thu, 29 Jul 2010 17:09:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>The Local to Global Principle</title>
		<link>http://www.win-vector.com/blog/2009/11/the-local-to-global-principle/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=the-local-to-global-principle</link>
		<comments>http://www.win-vector.com/blog/2009/11/the-local-to-global-principle/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 16:37:53 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Exciting Techniques]]></category>
		<category><![CDATA[Expository Writing]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Dynamic Programming]]></category>
		<category><![CDATA[Local to Global]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[PageRank]]></category>
		<category><![CDATA[Problem Solving]]></category>
		<category><![CDATA[Speech Recognition]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1123</guid>
		<description><![CDATA[We describe the &#8220;the local to global principle.&#8221; It is a principle used to break algorithmic problem solving into two distinct phases (local criticism followed by global solution) and is an aid both in the design and in the application of algorithms. Instead of giving a formal definition of the principle we quickly define it [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/' rel='bookmark' title='Permanent Link: Automatic Differentiation with Scala'>Automatic Differentiation with Scala</a></li>
<li><a href='http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/' rel='bookmark' title='Permanent Link: A Demonstration of Data Mining'>A Demonstration of Data Mining</a></li>
<li><a href='http://www.win-vector.com/blog/2009/07/should-your-mom-use-google-search/' rel='bookmark' title='Permanent Link: Should your mom use Google search?'>Should your mom use Google search?</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>We describe the &#8220;the local to global principle.&#8221; It is a principle used to break algorithmic problem solving into two distinct phases (local criticism followed by global solution) and is an aid both in the design and in the application of algorithms. Instead of giving a formal definition of the principle we quickly define it and discuss a few examples and methods.  We have produced both a stand-alone <a href="http://www.win-vector.com/dfiles/LocalToGlobal.pdf">PDF</a> (more legible) and a HTML/blog form (more skimable).<br />
<span id="more-1123"></span></p>
<h1 align="center">The Local to Global Principle</h1>
<p align="center"><strong>John Mount<a name="tex2html3" href="#foot21" id="tex2html3"><sup>1</sup></a></strong></p>
<p></p>
<p align="center"><b>Date:</b> November 11, 2009</p>
<hr />
<h3>Abstract:</h3>
<div>We describe the &#8220;the local to global principle.&#8221; It is a principle used to break algorithmic problem solving into two distinct phases (local criticism followed by global solution) and is an aid both in the design and in the application of algorithms. Instead of giving a formal definition of the principle we quickly define it and discuss a few examples and methods.</div>
<p></p>
<h2><a name="SECTION00010000000000000000" id="SECTION00010000000000000000">Contents</a></h2>
<p><!--Table of Contents--></p>
<ul>
<li><a name="tex2html32" href="#SECTION00020000000000000000" id="tex2html32">Introduction</a></li>
<li><a name="tex2html33" href="#SECTION00030000000000000000" id="tex2html33">The Examples</a>
<ul>
<li><a name="tex2html34" href="#SECTION00031000000000000000" id="tex2html34">Web Page Link Analysis</a></li>
<li><a name="tex2html35" href="#SECTION00032000000000000000" id="tex2html35">Natural Language Processing</a></li>
<li><a name="tex2html36" href="#SECTION00033000000000000000" id="tex2html36">Machine Learning</a></li>
</ul>
<p></li>
<li><a name="tex2html37" href="#SECTION00040000000000000000" id="tex2html37">Some Methods</a>
<ul>
<li><a name="tex2html38" href="#SECTION00041000000000000000" id="tex2html38">Local Methods</a></li>
<li><a name="tex2html39" href="#SECTION00042000000000000000" id="tex2html39">Globalization Methods</a></li>
</ul>
<p></li>
<li><a name="tex2html40" href="#SECTION00050000000000000000" id="tex2html40">Conclusion</a></li>
<li><a name="tex2html41" href="#SECTION00060000000000000000" id="tex2html41">Bibliography</a></li>
<li><a name="tex2html42" href="#SECTION00070000000000000000" id="tex2html42">Acknowledgement</a></li>
</ul>
<p><!--End of Table of Contents--></p>
<h1><a name="SECTION00020000000000000000" id="SECTION00020000000000000000">Introduction</a></h1>
<p><font>A common vain hope of computer scientists and algorithm designers is that a domain expert has already &#8220;boiled down&#8221; a problem to a precise, but unsolved, algorithmic core. On this point the mathematician Gian-Carlo Rota wrote:</font></p>
<blockquote><p><font>One of the rarest mathematical talents is the talent for applied mathematics, for picking out of a maze of experimental data the two or three parameters that are relevant, and to discard all other data. This talent is rare. It is taught only at the shop level.[<a href="#IndiscreteThoughts">Rot97</a>, ``A Mathematician's Gossip'']</font></p></blockquote>
<p><font>We describe a useful tool for designing algorithmic applications and solutions which we call &#8220;the local to global principle.&#8221; The local to global principle is the method of deriving applications and solutions by specifying &#8220;local&#8221; (and deliberately myopic) heuristics, critiques and methods followed by using a powerful general method to &#8220;globalize&#8221; this specification into a complete solution.</font></p>
<p><font>There are many important problem solving prescriptions and methods of thought already systematically described and taught:</font></p>
<ul>
<li>Bacon&#8217;s &#8220;New Organon&#8221; and Mill&#8217;s principles of inductive logic.[<a href="#Mill">Mil02</a>]</li>
<li>Feynman&#8217;s genius method.[<a href="#IndiscreteThoughts">Rot97</a>, ``Ten Lessons I Wish I Had Been Taught'']</li>
<li>Reductionism (top down and bottom up).</li>
<li>Divide and conquer.[<a href="#IntroductionToAlgorithms">CLRS09</a>]</li>
<li>Forward deduction, backwards induction.</li>
<li>Root Cause Analysis.</li>
<li>Polya&#8217;s heuristic and conjecture and prove patterns [<a href="#citeulike:679515">Pol71</a>,<a href="#Polya1">Pol54a</a>,<a href="#Polya2">Pol54b</a>]</li>
<li>Doron Zeilberger&#8217;s &#8220;Method of Undetermined Generalization and Specialization.&#8221; [<a href="#Zeilberger:1995p277">Zei95</a>]</li>
<li>Zbigniew Michalewicz and David B. Fogel&#8217;s presentation of evolutionary algorithms.[<a href="#HTSMH">MF00</a>]</li>
</ul>
<p><font>The local to global principle is more of an organizational pattern than &#8220;computer aided technique&#8221; as no one specific species of software or family of notation is required.</font></p>
<p><font>The local to global principle can be identified in a number of previous important applications, but it is not currently an identified principle.<a name="tex2html4" href="#foot244" id="tex2html4"><sup>2</sup></a> The principle is very general, so any succinct description of it is going to be painfully vague. Instead, we explain the principle by discussing some example applications and methods.  For each of our example applications we deliberately use a different globalization technique. The effective algorithmist or practitioner must in fact come to each problem already familiar with a reasonably large set of already known local and global techniques, so we conclude with some appropriate fields of study and preparation.</font></p>
<p><font>The local to global principle is divided into two parts: local encoding of the problem followed by a globalization step that uses the encoding. The guiding feature of local encodings is that they are usually easy to compute from the data at hand. Any extension that looks like enumeration, search or optimization is best left to the global step. The local step is essentially the translation of your problem into an abstract language that is ready for the globalization step. In contrast globalization methods are often &#8220;off the shelf&#8221; in that once you abstract and encode the particulars of your problem you can look for pre-existing useful methods or software to finish your solution. The idea of globalization is to find a best overall or global compromise between competing local criteria. The local step does not so much have to avoid conflicts but instead &#8220;price them.&#8221; There is also an important trade-off that sophisticated local techniques allow the use of simpler globalization methods and more powerful globalization methods allow the use of simpler local techniques.</font></p>
<h1><a name="SECTION00030000000000000000" id="SECTION00030000000000000000">The Examples</a></h1>
<p><font>To demonstrate the breadth of the local to global principle we choose a diverse collection of example applications: web page link analysis, natural language processing and machine learning. For each example application we will set up the problem, introduce a reasonable set of local criteria and pick an appropriate globalization technique. We will favor finishing each example without describing the globalization technique in detail, as this would distract from our point and is best left to the given references. These examples are previously solved problems, our contribution is demonstrating the shared underlying principle.</font></p>
<h2><a name="SECTION00031000000000000000" id="SECTION00031000000000000000">Web Page Link Analysis</a></h2>
<p><font>For our first example application we demonstrate web page link analysis in the form of the famous PageRank score.[<a href="#Page:1998p2689">PBMW98</a>]</font></p>
<p><font>One of the many good ideas leading up to the early Google search engine was the design of a non-text based measure of importance or interestingness of web pages. A search engine that could fold &#8220;interestingness&#8221; or popularity into its notion of relevance could better sort important pages into the search user&#8217;s view. When the web got so large that there were many pages that were exact matches to any common user query popularity became a critical consideration. A link based notion of popularity exploits what is important about the web (the link structure, for example see [<a href="#Kleinberg:1997p32">Kle97</a>]) and avoids having to depend on a lot of natural language understanding technology. This technique also uses authority outside of the given page, so has some hope at being resistant (though not immune) to web-spam.</font></p>
<p><font>Taken all at once, the task of designing a score of page importance is a daunting task. However, by working in stages (as the local to global principle prescribes) we can quickly derive interesting scores including the famous PageRank score. We start with the idea that popularity (or the amount of web traffic a page receives) is (loosely) correlated with importance. So for our first approximation step we decide to try to estimate popularity (or web traffic) and use this estimate as our importance score. Accurately estimating web traffic is itself a hard problem and a big industry (just a few of the major companies involved in this are: Google/Urchin, Quantcast, Nielsen, comScore, Alexa, Hitwise and LookSmart). For our second approximation step we are going to try and estimate popularity from the link structure<a name="tex2html6" href="#foot43" id="tex2html6"><sup>4</sup></a> of the web (using no other measurements or historic data) and use this as our score. This link based estimate is unlikely to completely reproduce real web surfing patterns, but it is very interesting in its own right and has been proven in the market to be a useful score.</font></p>
<p><font>Now the problem is to try to estimate the popularity of a web page from the link structure of the web. We claim: we can generate a useful (but not necessarily accurate) estimate of web traffic from the web&#8217;s link structure alone. Consider Figure&nbsp;<a href="#fig:Links1">1</a> where we have a universe of three web pages A,B and C that link to each other in the pattern illustrated by what is called a graph<a name="tex2html7" href="#foot45" id="tex2html7"><sup>5</sup></a></font></p>
<div align="center"><a name="fig:Links1" id="fig:Links1"></a><a name="50"></a></p>
<table>
<caption align="bottom"><strong>Figure 1:</strong> A set of Mutually Linked Web Pages</caption>
<tr>
<td>
<div align="center"><img width="300" height="436" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/Links1.png" alt="Image Links1"></div>
</td>
</tr>
</table>
</div>
<p><font>In Figure&nbsp;<a href="#fig:Links1">1</a> we can consider each link to another page as evidence the other page is interesting or popular. One idea is to simulate a very simple web surfer who clicks on the links on a page uniformly at random. This is called &#8220;the random surfer model&#8221; and even a model this simple allows us to read some useful information from the link structure of the web. For instance, we could ask what fraction of their time the random surfer spends on each web page, with an eye to the idea that the pages the random surfer visits more often are the more important ones. Let <img width="35" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg2.png" alt="$ p(A)$"> denote the proportion of time the random web surfer spends on page A (and define <img width="36" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg3.png" alt="$ p(B)$"> and <img width="36" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg4.png" alt="$ p(C)$"> similarly). While we do not know any of <!-- MATH<br />
 $p(A), p(B)$<br />
 --><br />
<img width="76" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg5.png" alt="$ p(A), p(B)$"> or <img width="36" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg4.png" alt="$ p(C)$"> we can derive some relationships between them by inspecting the link graph:</font></p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
p(A) &#038; = &#038; \frac{1}{2} P(B) + P(C) \\<br />
p(B) &#038; = &#038; \frac{1}{2} P(A) \\<br />
p(C) &#038; = &#038; \frac{1}{2} P(A) + \frac{1}{2} P(B) .<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="35" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg6.png" alt="$\displaystyle p(A)$"></td>
<td width="10" align="center" nowrap><img width="16" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg7.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap><img width="109" height="49" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg8.png" alt="$\displaystyle \frac{1}{2} P(B) + P(C)$"></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="36" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg9.png" alt="$\displaystyle p(B)$"></td>
<td width="10" align="center" nowrap><img width="16" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg7.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap><img width="52" height="49" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg10.png" alt="$\displaystyle \frac{1}{2} P(A)$"></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="36" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg11.png" alt="$\displaystyle p(C)$"></td>
<td width="10" align="center" nowrap><img width="16" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg7.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap><img width="125" height="49" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg12.png" alt="$\displaystyle \frac{1}{2} P(A) + \frac{1}{2} P(B) .$"></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"></p>
<p><font>The first equation is just reading from the graph that: all visits on page-A must come from pages B and C, half of the visitors on page-B continue on to A and all of the visitors on page-C continue on to A. The second and third equations are the appropriate summaries of how traffic is routed to pages B and C. We can insist that <!-- MATH<br />
 $P(A) + P(B)<br />
+ P(C) = 1$<br />
 --><br />
<img width="183" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg13.png" alt="$ P(A) + P(B) + P(C) = 1$"> as we want these numbers to represent the fraction of time the random web surfer spends on each page. A more sophisticated model would add more features<a name="tex2html9" href="#foot245" id="tex2html9"><sup>6</sup></a> to get a more useful result.</font></p>
<p><font>It turns out we have already encoded enough local rules to completely determine <!-- MATH<br />
 $P(A), P(B)$<br />
 --><br />
<img width="85" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg14.png" alt="$ P(A), P(B)$"> and <img width="40" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg15.png" alt="$ P(C)$"> . In this example application an algorithmist already familiar with linear algebra&nbsp;[<a href="#Strang">Str76</a>] would recognize these local conditions as &#8220;a system of linear equations.&#8221; Solving even web-scale systems of linear systems is considered easy with modern techniques and modern computers. For our small example example the solution is: <!-- MATH<br />
 $p(A) = \frac{4}{9}$<br />
 --><br />
<img width="67" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg16.png" alt="$ p(A) = \frac{4}{9}$"> , <!-- MATH<br />
 $p(B) = \frac{2}{9}$<br />
 --><br />
<img width="68" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg17.png" alt="$ p(B) = \frac{2}{9}$"> , and <!-- MATH<br />
 $p(C) = \frac{3}{9}$<br />
 --><br />
<img width="67" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg18.png" alt="$ p(C) = \frac{3}{9}$"> . The role of the local steps was to reduce a new problem (estimating the importance or popularity of web page from the link structure) to something with its <em>already known</em> known techniques (like solving a linear system as illustrated in Figure&nbsp;<a href="#fig:LinAlg">2</a>).</font></p>
<div align="center"><a name="fig:LinAlg" id="fig:LinAlg"></a><a name="79"></a></p>
<table>
<caption align="bottom"><strong>Figure 2:</strong> Linear Algebra Solution: As Taught in School</caption>
<tr>
<td>
<div align="center"><img width="400" height="365" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LinAlg.jpg" alt="Image LinAlg"></div>
</td>
</tr>
</table>
</div>
<p><font>So page-A is the most important page by the PageRank measure.</font></p>
<p><font>In this example application the local step was setting up the system of linear equalities (which are easy to derive from the web link graph) and the global step was solving the entire system for the final scores (which were not obvious). You spend most of your time encoding the problem and then use a known technique (in this case solving a linear system) to finish the solution.</font></p>
<h2><a name="SECTION00032000000000000000" id="SECTION00032000000000000000">Natural Language Processing</a></h2>
<p><font>Our next example application is natural language processing&nbsp;[<a href="#CharniakBook">Cha96</a>,<a href="#Charniak:1997p1484">Cha97</a>]. Speech recognition (the alignment or transcription of recognized intelligible segments of sound to written text) is an important problem in natural language processing. An example problem is the need to find the most likely text matching a sequence of sounds such as is shown in Figure&nbsp;<a href="#fig:SoundSeq1">3</a>.</font></p>
<div align="center"><a name="fig:SoundSeq1" id="fig:SoundSeq1"></a><a name="89"></a></p>
<table>
<caption align="bottom"><strong>Figure 3:</strong> A Sequence of Sounds</caption>
<tr>
<td>
<div align="center"><img width="500" height="69" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/SoundSeq1.png" alt="Image SoundSeq1"></div>
</td>
</tr>
</table>
</div>
<p><font>Consider Figure&nbsp;<a href="#fig:SoundSeq3">4</a> (which shows a bad transcription) and Figure&nbsp;<a href="#fig:SoundSeq2">5</a> (which shows a good transcription).</font></p>
<div align="center"><a name="fig:SoundSeq3" id="fig:SoundSeq3"></a><a name="98"></a></p>
<table>
<caption align="bottom"><strong>Figure 4:</strong> A Bad Transcription</caption>
<tr>
<td>
<div align="center"><img width="500" height="142" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/SoundSeq3.png" alt="Image SoundSeq3"></div>
</td>
</tr>
</table>
</div>
<div align="center"><a name="fig:SoundSeq2" id="fig:SoundSeq2"></a><a name="105"></a></p>
<table>
<caption align="bottom"><strong>Figure 5:</strong> A Good Transcription</caption>
<tr>
<td>
<div align="center"><img width="500" height="142" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/SoundSeq2.png" alt="Image SoundSeq2"></div>
</td>
</tr>
</table>
</div>
<p><font>Our claim: we can (given access to training data, and this is the age of data&nbsp;[<a href="#Halevy:2009p2327">HNP09</a>]) solve this problem with a local step that is a set of simple criticisms of proposed transcriptions. A good starting point is a database of previous sounds to text transcriptions. This database allows the construction of a series of tables that give the historic frequency (or probability) of all of the following:</font></p>
<ul>
<li>Prior probability of each sound</li>
<li>Probability of each sound given the immediately previous sound</li>
<li>Prior probability of each word</li>
<li>Probability of each word given the immediately previous word</li>
<li>Which combinations of word fragments are legitimate words</li>
<li>Probability of each sound being assigned to each word fragment (syllables, phonemes and so on).</li>
</ul>
<p><font>These tables encode a &#8220;speech model&#8221; (the rules involving sounds only), a language model (the rules involving text or words only) and the linkage between the two models. These models are deliberately simple in that they capture only local interactions (like probability of a word given the word before it) but no long range interactions (like subject predicate agreement).</font></p>
<p><font>Each box, nested box and arrow on our diagram represents one possible local critique. For each item in our diagram (again, the boxes and arrows) we can use our tables to assign a goodness or plausibility score. For instance bad word to word transitions (like &#8220;won&#8221; <!-- MATH<br />
 $\rightarrow$<br />
 --><br />
<img width="19" height="13" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg19.png" alt="$ \rightarrow$"> &#8220;won&#8221;) will be rare in our historic tables so, just looking up probabilities from the tables (or, better, using the logarithms of probabilities) gives as a &#8220;plausibility score&#8221; that prefers known patterns of language. Then a score for the overall transcription can be derived by multiplying all of the local scores together. These local scores (though simple) already have encoded enough evidence to prefer the good transcription to the bad transcription <em>without</em> requiring any deep knowledge of speech, text or the meaning of the text. This is because the bad transcription has a series of obvious flaws such as: unlikely sound to word fragment assignments and unlikely word to word transitions.</font></p>
<div align="center"><a name="fig:SoundSeqPartial" id="fig:SoundSeqPartial"></a><a name="116"></a></p>
<table>
<caption align="bottom"><strong>Figure 6:</strong> Naively Extending a Partial Transcription</caption>
<tr>
<td>
<div align="center"><img width="500" height="142" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/SoundSeqPartial.png" alt="Image SoundSeqPartial"></div>
</td>
</tr>
</table>
</div>
<p><font>For example consider Figure&nbsp;<a href="#fig:SoundSeqPartial">6</a> where a naive solver is in the process of considering selecting the word &#8220;one&#8221; as the third word to fill in. The <em>only</em> local critiques they need to consider are:</font></p>
<ul>
<li>how likely the word &#8220;one&#8221; is in general (call this <img width="49" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg20.png" alt="$ P[one]$"> )</li>
<li>how likely the word &#8220;one&#8221; is to follow the word &#8220;nine&#8221; (call this <!-- MATH<br />
 $P[one | nine]$<br />
 --><br />
<img width="86" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg21.png" alt="$ P[one \vert nine]$"> )</li>
<li>how likely the letter sequence &#8220;o&#8221; is given the sound &#8220;w&#8221; (call this <!-- MATH<br />
 $P[o | \text{w\textschwa}]$<br />
 --><br />
<img width="55" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg24.png" alt="$P[o \vert \text{w\textschwa}]$"> )</li>
<li>how likely the letter sequence &#8220;ne&#8221; is given the sound &#8220;n&#8221; (call this <!-- MATH<br />
 $P[ne | \text{n}]$<br />
 --><br />
<img width="41" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg25.png" alt="$ P[ne \vert$">&nbsp; &nbsp;n<img width="7" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg23.png" alt="$ ]$"> ).</li>
</ul>
<p><font>So the local plausibility of the fill-in word &#8220;one&#8221; is: <!-- MATH<br />
 $P[one]<br />
\times P[one | nine] \times P[o | \text{w\textschwa}] \times P[ne |<br />
\text{o}]$<br />
 --><br />
<img width="292" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg28.png" alt="$P[one] \times P[one \vert nine] \times P[o \vert \text{w\textschwa}] \times P[ne \vert \text{o}]$"> . We will call this the critique of &#8220;one&#8221; in position 3 and write as <!-- MATH<br />
 $C_3(w_2,one)$<br />
 --><br />
<img width="84" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg29.png" alt="$ C_3(w_2,one)$"> where <img width="22" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg30.png" alt="$ w_2$"> is the word known to be in position 2. Similarly we can generate all of the possible critiques <img width="53" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg31.png" alt="$ C_1(w_1)$"> , <!-- MATH<br />
 $C_2(w_1,w_2)$<br />
 --><br />
<img width="78" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg32.png" alt="$ C_2(w_1,w_2)$"> , <!-- MATH<br />
 $C_3(w_2,w_3)$<br />
 --><br />
<img width="78" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg33.png" alt="$ C_3(w_2,w_3)$"> , <!-- MATH<br />
 $C_4(w_3,w_4)$<br />
 --><br />
<img width="78" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg34.png" alt="$ C_4(w_3,w_4)$"> and the overall criticize of a sequence <!-- MATH<br />
 $w_1 \; w_2 \; w_3 \; w_4$<br />
 --><br />
<img width="77" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg35.png" alt="$ w_1 \; w_2 \; w_3 \; w_4$"> : <!-- MATH<br />
 $C_1(w_1)<br />
\times C_2(w_1,w_2) \times C_3(w_2,w_3) \times C_4(w_3,w_4)$<br />
 --><br />
<img width="336" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg36.png" alt="$ C_1(w_1) \times C_2(w_1,w_2) \times C_3(w_2,w_3) \times C_4(w_3,w_4)$"> from our pre-computed tables of probabilities. Notice for all of these critiques only the immediately previous word and the nearby sounds were used to determine the plausibility of the word we are attempting to fit in. Instead of using these critiques to directly fill in a possible solution (or using search) we will package up these critiques (in the form of the <img width="32" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg37.png" alt="$ C_i()$"> ) and pass them on to a powerful separate globalization step called Dynamic Programming&nbsp;[<a href="#DynamicProgramming">Bel57</a>].</font></p>
<p><font>The globalization or finding of a best overall transcription is not trivial even though our score is simple. This is because the overall <em>best</em> sequence could depend on clever non-local fill-ins (like deliberately picking a less likely first word to allow a later favored transition to a fantastically good third word). Dynamic Programing does not fill in the transcription from left to right, but instead uses a table of scores derived from the left to right arrows and the <img width="32" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg37.png" alt="$ C_i()$"> . In our example Dynamic Programming consists of building a table of information as shown in Figure&nbsp;<a href="#fig:DynBackFill">7</a>. Let <img width="9" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg38.png" alt="$ i$"> represent the word position we are working looking at (so <img width="9" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg38.png" alt="$ i$"> ranges from 1 to 4) and let <img width="15" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg39.png" alt="$ w$"> be a variable that ranges over every word in the dictionary. Our table is indexed by <img width="9" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg38.png" alt="$ i$"> and <img width="15" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg39.png" alt="$ w$"> and when filled in <img width="51" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg40.png" alt="$ T(i,w)$"> stores what the highest &#8220;plausibility score&#8221; of a partial sequence of words where words 1 through <img width="9" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg38.png" alt="$ i$"> have been filled in and the <img width="9" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg38.png" alt="$ i$"> -th word is <img width="15" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg39.png" alt="$ w$"> .</font></p>
<div align="center"><a name="fig:DynBackFill" id="fig:DynBackFill"></a><a name="134"></a></p>
<table>
<caption align="bottom"><strong>Figure 7:</strong> Dynamic Programming: Back Chaining in <img width="27" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg1.png" alt="$ T()$"> for a Solution</caption>
<tr>
<td>
<div align="center"><img width="300" height="298" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/DynTableBackFill.png" alt="Image DynTableBackFill"></div>
</td>
</tr>
</table>
</div>
<p><font>If we already had this magic table <img width="27" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg1.png" alt="$ T()$"> we could find a best possible sequence by &#8220;back chaining.&#8221; We start by finding a fourth word (<img width="22" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg41.png" alt="$ w_4$"> ) such that <img width="61" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg42.png" alt="$ T(4,w_4)$"> is maximal (in this case &#8220;one&#8221;). We then find a best third word (<img width="22" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg43.png" alt="$ w_3$"> ) by enumerating all words and picking <img width="22" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg43.png" alt="$ w_3$"> such that <!-- MATH<br />
 $T(3,w_3) \times C_4(w_3,w_4) = T(4,w_4)$<br />
 --><br />
<img width="234" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg44.png" alt="$ T(3,w_3) \times C_4(w_3,w_4) = T(4,w_4)$"> . We continue back until we had found words <img width="22" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg30.png" alt="$ w_2$"> and <img width="22" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg45.png" alt="$ w_1$"> to get a complete best sequence. Notice that we work from right to left (backwards) and except for the starting step we pick each word to match the calculation we are trying to un-roll, not to be the maximal entry in the column. For instance we pick <!-- MATH<br />
 $w_1 = dial$<br />
 --><br />
<img width="70" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg46.png" alt="$ w_1 = dial$"> even though it does not have a the highest score, but because <!-- MATH<br />
 $T(1,dial) C_2(dial,nine)<br />
C_3(nine,one) C_4(one,one) = T(4,one)$<br />
 --><br />
<img width="433" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg47.png" alt="$ T(1,dial) C_2(dial,nine) C_3(nine,one) C_4(one,one) = T(4,one)$"> is the maximal complete chain.</font></p>
<p><font>Of course, we don&#8217;t start with the table <img width="27" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg1.png" alt="$ T()$"> already filled in- so we need a procedure to build it. This procedure is the heart of the Dynamic Programming method (for more examples see: &#8220;Introduction to Algorithms&#8221;&nbsp;[<a href="#IntroductionToAlgorithms">CLRS09</a>]). Notice that <img width="54" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg48.png" alt="$ T(1,w)$"> can be filled in for all <img width="15" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg39.png" alt="$ w$"> just by plugging in words and computing the critiques <img width="46" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg49.png" alt="$ C_1(w)$"> (i.e. <!-- MATH<br />
 $T(1,w) = C_1(w)$<br />
 --><br />
<img width="118" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg50.png" alt="$ T(1,w) = C_1(w)$"> ). Once all the <img width="54" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg48.png" alt="$ T(1,w)$"> are filled in we can fill in the the <img width="54" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg51.png" alt="$ T(2,w)$"> with the general (and slightly trickier) formula:</font></p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
T(i+1,w) = \max_{v} T(i,v) C_{i+1}(v,w)<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="249" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg52.png" alt="$\displaystyle T(i+1,w) = \max_{v} T(i,v) C_{i+1}(v,w) $"></div>
<p><font>as we illustrate for <img width="74" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg53.png" alt="$ T(2,nine)$"> in Figure&nbsp;<a href="#fig:DynTable">8</a>.</font></p>
<div align="center"><a name="fig:DynTable" id="fig:DynTable"></a><a name="145"></a></p>
<table>
<caption align="bottom"><strong>Figure 8:</strong> Dynamic Programming: Building the Table <img width="27" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg1.png" alt="$ T()$"></caption>
<tr>
<td>
<div align="center"><img width="400" height="261" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/DynTableCalculate.png" alt="Image DynTableCalculate"></div>
</td>
</tr>
</table>
</div>
<p><font>The magic of the Dynamic Programing technique is: by being careful to not store too much in the table <img width="51" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg40.png" alt="$ T(i,w)$"> we avoid an explosion in record keeping that would render the method inefficient. Dynamic Programming exploits the small dependence structure encoded in <img width="32" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg37.png" alt="$ C_i()$"> (each box in our diagram depending on only a few arrows) and as we have shown can find &#8220;clever&#8221; solutions (such as taking a sub-optimal first word to get better transitions into preferred later words). For those who want more detail on solving this problem we recommend [<a href="#CharniakBook">Cha96</a>] (as our goal is not to fully explain Dynamic Programming, but to demonstrate how it could be applied to the transcription problem as a pre-packaged globalizer).</font></p>
<p><font>In this example the local step was the graph link based critiques and the globalization step was Dynamic Programming. The separation of concerns from the local scoring to the globalizing step is a strength of the local to global principle.</font></p>
<h2><a name="SECTION00033000000000000000" id="SECTION00033000000000000000">Machine Learning</a></h2>
<p><font>Our final example application is machine learning. Machine learning is loosely defined as computer programs that adapt or learn from data. Thomas Mitchell helps distinguish this activity as a specialty of artificial intelligence that concentrates on &#8220;well-posed learning problems.&#8221;&nbsp;[<a href="#MitchellML">Mit97</a>] Trevor Hastie, Robert Tibshirani, Jerome Friedman emphasize the relation to statistics (versus more traditional symbolic AI)&nbsp;[<a href="#TibHat">TH09</a>]. A simple demonstration can be found in [<a href="#MLArt">Mou09b</a>].</font></p>
<p><font>Machine learning is perhaps the strongest example of the local to global principle and is inspired by the work of Kristin P. Bennett and Emilio Parrado-Hernandez&nbsp;[<a href="#Bennett:2006p400">BPH06</a>]. In hindsight many machine learning algorithms (each of which has had a turn at being &#8220;the most exciting breakthrough ever&#8221; for a while) can be seen as the pairing of a performance criterion (which we call a local criterion as it applies to one specific set of parameter values at a time) and an optimization method (what we have been calling the globalization step). The work of Bennett and Parrado-Hernandez calls this distinction out and shows how it is not productive to present machine learning systems as unique named monolithic units, but instead to consider how to break them into an objective function and an optimizer. This allows both choice of better optimizers (such as replacing the inferior method of gradient descent method wherever it occurs) and for explicit control of important concepts such as hypothesis regularization and control of over-fitting (which some algorithms claim to achieve by deliberately using early exit from a an inferior optimizer).</font></p>
<p><font>At a &#8220;30,000 feet level&#8221; we can build a table of common machine learning techniques and name what is commonly used to implement their local and global steps. When a machine learning algorithm is defined by what conditions are meant to be true at the optimum we are no longer bound by details of the original implementation and can examine fix and improve the components.<a name="tex2html17" href="#foot154" id="tex2html17"><sup>7</sup></a> Table&nbsp;<a href="#fig:MachineLearning">1</a> is a crude summary of a wide selection for machine learning algorithms that may be more likely to offend everybody than just offend somebody. But this is also the point: it is the algorithmist&#8217;s job to think fluidly (beyond given names and provenances) and to invent scaffolding to convert partial analogies into practical correspondences.</font></p>
<p></p>
<div align="center"><a name="190"></a></p>
<table>
<caption><strong>Table 1:</strong> Various Machine Learning Techniques</caption>
<tr>
<td>
<div align="center">
<table cellpadding="3" border="1" align="center">
<tr>
<td align="left" valign="top" width="180"><font size="-1">Machine Learning Method</font></td>
<td align="left" valign="top" width="144"><font size="-1">Local Criterion</font></td>
<td align="left" valign="top" width="144"><font size="-1">Globalization Method</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Linear Regression [<a href="#Breiman:1997p1133">BF97</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">square error</font></td>
<td align="left" valign="top" width="144"><font size="-1">Linear Algebra</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Linear Discriminant Analysis [<a href="#Fisher:1936p2576">Fis36</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">square error</font></td>
<td align="left" valign="top" width="144"><font size="-1">Linear Algebra</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Logistic Regression [<a href="#Komarek:2008p1742">Kom08</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">logit penalty</font></td>
<td align="left" valign="top" width="144"><font size="-1">Newton&#8217;s Method</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Perceptron [<a href="#Beigel:1991p1027">BRS91</a>] [<a href="#Blum:2002p1867">BD02</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">error rate</font></td>
<td align="left" valign="top" width="144"><font size="-1">error based update</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Naive Bayes [<a href="#Maron:2000p2553">MK00</a>] [<a href="#Maron:1961p2566">Mar61</a>] [<a href="#Lewis:1998p105">Lew98</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">frequency tables</font></td>
<td align="left" valign="top" width="144"><font size="-1">arithmetic</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Nearest Neighbor [<a href="#Ailon:2006p872">AC06</a>] [<a href="#Indyk:1999p166">IM99</a>] [<a href="#Andoni:2006p52">AI06</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">Kernel Methods</font></td>
<td align="left" valign="top" width="144"><font size="-1">enumeration,<br />
projection</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Decision Trees [<a href="#bfso:1984">BFSO84</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">information theory</font></td>
<td align="left" valign="top" width="144"><font size="-1">partitioning</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">clustering [<a href="#Cilibrasi:2005p8">CV05</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">square error</font></td>
<td align="left" valign="top" width="144"><font size="-1">partitioning</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">MaxEnt [<a href="#Grunwald:2000p108">Gru00</a>] [<a href="#Grunwald:2004p739">GD04</a>] [<a href="#Skilling:1988p780">Ski88</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">entropy penalty</font></td>
<td align="left" valign="top" width="144"><font size="-1">Newton&#8217;s Method</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Neural Net with Back Propagation [<a href="#NNCPE">Hus99</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">sigmoid penalty function</font></td>
<td align="left" valign="top" width="144"><font size="-1">Automatic Differentiation,<br />
steepest descent</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Winnow [<a href="#Kivinen:1995p1836">KWA95</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">error rate</font></td>
<td align="left" valign="top" width="144"><font size="-1">multiplicative error based update</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Boosting [<a href="#Freund:1999p1015">FS99</a>] [<a href="#Breiman:2000p1134">Bre00</a>] [<a href="#Collins:2002p1008">CSS02</a>] [<a href="#Trevisan:2008p2166">TTV08</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">weighted errors,<br />
data re-weighting</font></td>
<td align="left" valign="top" width="144"><font size="-1">Conjugate Gradient</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">HMM [<a href="#Kristjansson:2004p545">KCVM04</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">probability penalty</font></td>
<td align="left" valign="top" width="144"><font size="-1">Gibbs Sampler</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Latent Dirichlet Allocation [<a href="#Blei:2003p1063">BNJ03</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">KL divergence</font></td>
<td align="left" valign="top" width="144"><font size="-1">Variational Methods</font></td>
</tr>
<tr>
<td align="left" valign="top" width="180"><font size="-1">Support Vector Machine [<a href="#Joachims:1998p406">Joa98</a>] [<a href="#SVMBook">STC00</a>]</font></td>
<td align="left" valign="top" width="144"><font size="-1">L1 Margin,<br />
Kernel Methods</font></td>
<td align="left" valign="top" width="144"><font size="-1">Quadratic Optimization</font></td>
</tr>
</table>
</div>
<p><a name="fig:MachineLearning" id="fig:MachineLearning"></a></td>
</tr>
</table>
</div>
<p></p>
<p><font>This table is a necessarily crude summary. For example: notice that several known techniques can not even be distinguished from each other by the local and global columns of the table.</font></p>
<p><font>There are a few points we would like to make. Back propagation was considered unique to Neural Nets for quite a while because it was so entwined with the technique it was not recognized as the simple application of Automatic Differentiation&nbsp;[<a href="#Rall:1996p2473">RC96</a>] that it is. Support Vector Machines (SVM) are remarkable for their uniform very good choice of component methods (maximum L1 margin objective regularization, Kernel Methods&nbsp;[<a href="#KernBook">STC04</a>] and sophisticated optimization methods&nbsp;[<a href="#Joachims:2006p403">Joa06</a>]). Many of the machine learning methods that SVM outperforms become again competitive when they adopt some of SVM&#8217;s technologies (especially using kernel methods to produce synthetic features).</font></p>
<p><font>Beyond these points we invoke a &#8220;globalizers are pre-packaged&#8221; principle and leave the discussion of machine learning and optimization to our reference: [<a href="#Bennett:2006p400">BPH06</a>]. In this example the local step is a per-example score or penalty and the globalization step is optimization.</font></p>
<h1><a name="SECTION00040000000000000000" id="SECTION00040000000000000000">Some Methods</a></h1>
<p><font>The application of the local to global principle is similar to the Feynman &#8220;genius method.&#8221; Feynman&#8217;s method is to always have in mind a list of problems and a list of solution methods. The genius step is: anytime you see a new problem or a new solution method to immediately try it against every item from the complementary list.&nbsp;[<a href="#IndiscreteThoughts">Rot97</a>, ``Ten Lessons I Wish I Had Been Taught''] This deliberate retention and activity greatly increases your problem solving ability. The power of the local to global principle is itself proportional to the number of local methods times the number of globalization strategies. Of course, to even start: the practitioner must already have available a number of candidate local and globalization methods. We list some methods and some guidance on variation and invention.</font></p>
<h2><a name="SECTION00041000000000000000" id="SECTION00041000000000000000">Local Methods</a></h2>
<p><font><img width="100" height="100" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/nails.jpg" alt="Image nails"> Good sources of ideas and analogies for local methods include:</font></p>
<ul>
<li>Introduce a Graph Structure
<p>A graph structure is a network of nodes connected by edges. Use of graphs was demonstrated both in the natural language processing and web page link analysis examples. We can dress up how we solved these problems and say we used a &#8220;Hidden Markov Model&#8221;, but the real power was we encoded our problem in a simple graph. Some problems (especially those from logic or those involving time) are essentially solved once they are translated out of their original form and into graph notation (for an example see: [<a href="#Mount:2000p360">Mou00</a>]).</p>
</li>
<li>Appeal to Physical Conservation Laws
<p>A good example physical law is Kirchhoff&#8217;s law or conservation of flow. All of the web page link analysis&#8217;s equations were derived by saying that the attention of at node is essentially the sum of attentions from other nodes (more sophisticated versions of the analysis actually do create and destroy flow, but they do it in a principled way).</p>
</li>
<li>Encode the Problem into an Objective Function
<p>This method is essentially your declaration that you intend to use an optimizer for the globalization step. In operations research this specific technique has long been the practice (with no disrespect: a very productive part of operations research has been translating different problems into linear programs so the simplex method can be applied, for an example see [<a href="#TradeArt">Mou09a</a>]).</p>
</li>
<li>Gradient Like Computations
<p>Includes Gradients, Secants, Lagrangians and other ideas from calculus. Gradients can drive optimizer based globalizers and techniques like Lagrangians are often powerful enough use mere inspection as the globalization step.</p>
</li>
<li>Violation Driven Updates
<p>This method is particularly effective when your problem is not amenable to continuous optimization. A good example is the Lin-Kernighan heuristic for solving the traveling salesman problem.[<a href="#Lin:1973p2739">LK73</a>] This heuristic looks at subsets of the problem and suggests improving &#8220;surgeries&#8221; (until no more such improvements are possible).</p>
</li>
<li>Introduction of Symbols
<p>Often, as with the web page link analysis example, you can not specify specific values for the unknowns, but you can specify relationships. You often can then solve for the symbols or introduce additional conditions and use an optimizer to complete the solution (see for example the maximum entropy method as described in [<a href="#Skilling:1988p780">Ski88</a>]).</p>
</li>
<li>Over Specification
<p>If we anticipate using a global step like search, enumeration, summation or integration then over specification is a good local idea.</p>
<p>For example: consider computing the probability that a fair count flipped 10 times comes up with heads exactly 3 times. The easiest way to perform this calculation is to specify exactly which 3 coins come up heads (the local over-specification) and then sum over all choices of 3 out of 10 coins (the global step). In mathematical notation this is:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
P[\text{exactly 3 heads out of 10 flips}] = \binom{10}{3} 2^{-10} \approx 0.117<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="20" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg54.png" alt="$\displaystyle P[$">exactly 3 heads out of 10 flips<img width="157" height="54" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg55.png" alt="$\displaystyle ] = \binom{10}{3} 2^{-10} \approx 0.117 $"></div>
<p>or just under 12%.</li>
<li>Under Specification
<p>One of the core principles of Dynamic Programming is to forget as much as possible about partial solutions, keeping only partial solution cost and just enough information to extend partial solutions. If you anticipate using something like Dynamic Programming as your globalization step then your goal should be to under specify.</p>
</li>
<li>Tables
<p>A key step of the natural language processing example was the use of tables of past experience to determine which sounds likely corresponded to which words, which words likely followed each other (and so on). Encoding domain knowledge or expertise as probability tables is a very effective problem solving strategy (especially if the globalization strategy is going to be search or Dynamic Programming). In natural language processing examples tables and statistics are <em>much</em> easier to manage than comprehensive rules or grammars.</p>
</li>
<li>Set up as Ranking or Machine Learning Problem
<p>This tactic is especially appropriate if your solution success metric is counts, frequencies or probabilities (instead of having to always be correct or always be optimal).</p>
</li>
</ul>
<h2><a name="SECTION00042000000000000000" id="SECTION00042000000000000000">Globalization Methods</a></h2>
<p><font><img width="100" height="100" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/hammer.jpg" alt="Image hammer"> The universe of possible globalization methods is very diverse (in particular globalization is not always optimization).</font></p>
<ul>
<li>Search / Enumeration
<p>Search can be slow, but it is always an option to consider. If your problem translates naturally into a graph structure or your solutions are naturally seen as being composed of small pieces search should be considered. One of the big advantages using the local phase to formally encode your problem&#8217;s structure and putting search off to the global phase is: you can use advanced search techniques. Once you are freed from your specific problem details it becomes much easier to consider search techniques like branch and bound, A*, game theoretic search and general speed-up techniques like hashing and caching.</p>
</li>
<li>Dynamic Programming
<p>If your problem has a bit more structure (in that partial solutions summarize and compose easily) then you can likely replace search with Dynamic Programming. The advantage is that Dynamic Programming typically offers an incredible speed up when compared to search.</p>
</li>
<li>Optimization
<p>If your problem is continuous (involves numbers instead of discrete or categorical decisions), can be encoded as a reasonable objective function (linear, positive definite quadratic) and has reasonable constraints (linear or convex) then you can immediately apply an optimizer as your globalization step. Typical optimization methods include: conjugate gradient, Newton methods, quasi Newton methods, linear programming and quadratic programming.</p>
</li>
<li>Combinatorial Optimization
<p>If your problem includes a &#8220;discrete variables&#8221; (that is variables that take on one of fixed set of values instead of values from a numeric range) then you may not be able to apply standard optimization techniques. At this point you may want to use more expensive combinatorial optimization techniques like integer linear programing or constraint satisfaction.</p>
</li>
<li>Fixed Point Methods / Iteration
<p>Fixed point methods are based on the idea: &#8220;incrementally improve until there is no incremental improvement possible.&#8221; If the problem is continuous this is similar to steepest descent. If the problem is discrete then this is similar the Lin-Kernighan heuristic.</p>
</li>
<li>Linear Algebra
<p>The web page link analysis and optimization examples were essentially solved once we reduced them to linear algebra. If you can write your problem as a linear relationship between unknowns or as the fixed-point of a linear operator (i.e. an <img width="12" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg56.png" alt="$ x$"> such that <img width="54" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/LTGimg57.png" alt="$ A x = x$"> ) then you can immediately use linear algebra to solve the problem at very large scale (e.g. web scale).</p>
</li>
<li>Sampling / Problem Kernels
<p>A very successful line of attack on large problems is to reduce to a smaller problem containing most of the essential difficulty. David Karger has produced a number of effective algorithms for graph cuts and flows using a theory of sampling&nbsp;[<a href="#Karger:1998p556">Kar98</a>]. Rod Downey and M. Fellows have demonstrated an effective theory of &#8220;problem kernels&#8221; that finds solution by focusing on smaller sub-problems (on which we can afford to use more expensive procedures).[<a href="#DF98">DF98</a>]</p>
</li>
<li>Amortized Analysis / Economic Mechanism Methods
<p>Daniel Sleator and Robert Tarjan&#8217;s ideas of amortized analysis&nbsp;[<a href="#Sleator:1985p168">ST85</a>] allow approximation schemes similar to problem kernels. The method is to approximately optimize by pairing a bunch of unavoidable large penalties (conditions we can&#8217;t meet) with some accounting credits (say bonuses from other conditions we are meeting very well). We then isolate these paired items and optimize the rest of the problem exactly. The technique often works by showing the approximation can not be too bad because, due to the pairing of large penalties to good credits, there can not be too many large penalties. An informal example is: if it is impossible to pick someplace where all of an office will eat for lunch, perhaps you can solve the problem by paying one person to accept a restaurant they do not like (if the removal of their objection opens up a venue that is acceptable to everybody else).</p>
</li>
<li>Relaxation / Homotopic methods
<p>These methods involve changing hard constraints to soft penalties (so allowing the constraints to be violated, but at a slowly increasing cost). After such a relaxation the homotopic (or continuous deformation) method is to increase the cost of violation and re-solve to try and get a trajectory of better and better nearly acceptable solutions that point to a possible overall solution.</p>
</li>
</ul>
<h1><a name="SECTION00050000000000000000" id="SECTION00050000000000000000">Conclusion</a></h1>
<p><font>The purpose of this article has been to make more visible an idea we call the local to global principle. This principle is an organizing tool useful both in designing and analyzing a wide variety of applications. Essentially the whole point of this writeup is to set up enough framework to quickly write down a table of advice such as Table&nbsp;<a href="#fig:ProblemTable">2</a> (and for such a table to mean something).</font></p>
<p></p>
<div align="center"><a name="227"></a></p>
<table>
<caption><strong>Table 2:</strong> Various Applications, Local Steps and Global Steps</caption>
<tr>
<td>
<div align="center">
<table cellpadding="3" border="1" align="center">
<tr>
<td align="left"><font size="-1">Example</font></td>
<td align="left"><font size="-1">Local Step</font></td>
<td align="left"><font size="-1">Global Step</font></td>
</tr>
<tr>
<td align="left"><font size="-1">speech transcription</font></td>
<td align="left"><font size="-1">tables</font></td>
<td align="left"><font size="-1">Dynamic Programming</font></td>
</tr>
<tr>
<td align="left"><font size="-1">PageRank</font></td>
<td align="left"><font size="-1">graph structure, linear equations</font></td>
<td align="left"><font size="-1">Linear Algebra</font></td>
</tr>
<tr>
<td align="left"><font size="-1">machine learning</font></td>
<td align="left"><font size="-1">objective function</font></td>
<td align="left"><font size="-1">optimization</font></td>
</tr>
</table>
</div>
<p><a name="fig:ProblemTable" id="fig:ProblemTable"></a></td>
</tr>
</table>
</div>
<p></p>
<p><font>The principle is not universal; not everything can be fit into such a table. For example the local to global decoupling is <em>not</em> a feature of the famous EM algorithm&nbsp;[<a href="#Dempster:1977p761">DLR77</a>], which depends on mixing predictions and corrections.</font></p>
<p><font>To conclude: the recipe is as follows. If you come to a problem with a large shopping bag of possible ways to build local criteria and powerful globalization procedures then you stand a very good chance of solving the problem quickly. Also, if you keep the local to global principle in mind you are more likely to identify and retain potential local tricks and globalizers when you see them and thus have a larger more nimble set of tools available to solve problems when the time comes.</font></p>
<h2><a name="SECTION00060000000000000000" id="SECTION00060000000000000000">Bibliography</a></h2>
<dl compact>
<dt><a name="Ailon:2006p872" id="Ailon:2006p872">AC06</a></dt>
<dd>Nir Ailon and Bernard Chazelle, <i>Approximate nearest neighbors and the fast johnson-lindenstrauss transform</i>, STOC (2006).</dd>
<dt><a name="Andoni:2006p52" id="Andoni:2006p52">AI06</a></dt>
<dd>Alexandr Andoni and Piotr Indyk, <i>Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions</i>.</dd>
<dt><a name="Blum:2002p1867" id="Blum:2002p1867">BD02</a></dt>
<dd>Avrim Blum and John Dunagan, <i>Smoothed analysis of the perceptron algorithm for linear programming</i>, SODA (2002), 11.</dd>
<dt><a name="DynamicProgramming" id="DynamicProgramming">Bel57</a></dt>
<dd>Richard Bellman, <i>Dynamic programming</i>, Princeton University Press, 1957.</dd>
<dt><a name="Breiman:1997p1133" id="Breiman:1997p1133">BF97</a></dt>
<dd>Leo Breiman and Jerome&nbsp;H Friedman, <i>Predicting multivariate responses in multiple linear regression</i>, Journal of the Royal Statistical Society, Series B (Methodological) <b>59</b> (1997), no.&nbsp;1, 3-54.</dd>
<dt><a name="bfso:1984" id="bfso:1984">BFSO84</a></dt>
<dd>Leo Breiman, Jerome Friedman, Charles&nbsp;J. Stone, and R.&nbsp;A. Olshen, <i>Classification and regression trees</i>, Chapman &amp; Hall/CRC, January 1984.</dd>
<dt><a name="Blei:2003p1063" id="Blei:2003p1063">BNJ03</a></dt>
<dd>David&nbsp;M Blei, Andrew&nbsp;Y Ng, and Michael&nbsp;I Jordan, <i>Latent dirichlet allocation</i>, Journal of Machine Learning Research <b>3</b> (2003), 993-1022.</dd>
<dt><a name="Bennett:2006p400" id="Bennett:2006p400">BPH06</a></dt>
<dd>Kristin&nbsp;P. Bennett and Emilio Parrado-Hernandez, <i>The interplay of optimization and machine learning research</i>, Journal of Machine Learning Research <b>7</b> (2006), 1265-1281.</dd>
<dt><a name="Breiman:2000p1134" id="Breiman:2000p1134">Bre00</a></dt>
<dd>Leo Breiman, <i>Special invited paper. additive logistic regression: A statistical view of boosting: Discussion</i>, Ann. Statist. <b>28</b> (2000), no.&nbsp;2, 374-377.</dd>
<dt><a name="Beigel:1991p1027" id="Beigel:1991p1027">BRS91</a></dt>
<dd>R&nbsp;Beigel, N&nbsp;Reingold, and D&nbsp;Spielman, <i>The perceptron strikes back</i>, Structure in Complexity Theory Conference <b>6</b> (1991), 286-291.</dd>
<dt><a name="CharniakBook" id="CharniakBook">Cha96</a></dt>
<dd>Eugene Charniak, <i>Statistical language learning</i>, MIT Press, 1996.</dd>
<dt><a name="Charniak:1997p1484" id="Charniak:1997p1484">Cha97</a></dt>
<dd>to3em, <i>Statistial techniques for natural language parsing</i>, AI Magazine <b>18</b> (1997), no.&nbsp;4, 33-44.</dd>
<dt><a name="IntroductionToAlgorithms" id="IntroductionToAlgorithms">CLRS09</a></dt>
<dd>Thomas&nbsp;H. Cormen, Charles&nbsp;E. Leiserson, Ronald&nbsp;L. Rivest, and Clifford Stein, <i>Introduction to algorithms</i>, MIT Press, 2009.</dd>
<dt><a name="Collins:2002p1008" id="Collins:2002p1008">CSS02</a></dt>
<dd>Michael Collins, Robert&nbsp;E Schapire, and Yoram Singer, <i>Logistic regression, adaboost and bregman distances</i>, Machine Learning <b>48</b> (2002), no.&nbsp;1/2/3, 30.</dd>
<dt><a name="Cilibrasi:2005p8" id="Cilibrasi:2005p8">CV05</a></dt>
<dd>Rudi Cilibrasi and Paul&nbsp;M.B Vitanyi, <i>Clustering by compression</i>, IEEE Transactions on Information Theory <b>51</b> (2005), no.&nbsp;4, 1523-1545.</dd>
<dt><a name="DF98" id="DF98">DF98</a></dt>
<dd>Rod&nbsp;G. Downey and M.&nbsp;R. Fellows, <i>Parameterized complexity</i>, Monographs in Computer Science, Springer, November 1998.</dd>
<dt><a name="Dempster:1977p761" id="Dempster:1977p761">DLR77</a></dt>
<dd>A&nbsp;P Dempster, N&nbsp;M Laird, and D&nbsp;B Rubin, <i>Maximum likelihood from incomplete data via the em algorithm</i>, Journal of the Royal Statistical Society, Series B (Methodological) <b>39</b> (1977), no.&nbsp;1, 1-38.</dd>
<dt><a name="Fisher:1936p2576" id="Fisher:1936p2576">Fis36</a></dt>
<dd>Ronald&nbsp;A Fisher, <i>The use of multiple measurements in taxonomic problems</i>, Annals of Eugenics <b>7</b> (1936), 179-188.</dd>
<dt><a name="Freund:1999p1015" id="Freund:1999p1015">FS99</a></dt>
<dd>Yoav Freund and Robert&nbsp;E Schapire, <i>A short introduction to boosting</i>, Journal of Japanese Society for Artificial Intelligence <b>14</b> (1999), no.&nbsp;5, 771-780.</dd>
<dt><a name="Grunwald:2004p739" id="Grunwald:2004p739">GD04</a></dt>
<dd>Peter&nbsp;D Grunwald and A&nbsp;Philip Dawid, <i>Game theory, maximum entropy, minimum discrepancy and robust bayesian decision theory</i>, Ann. Statist. <b>32</b> (2004), no.&nbsp;4, 1367-1433.</dd>
<dt><a name="Grunwald:2000p108" id="Grunwald:2000p108">Gru00</a></dt>
<dd>PD&nbsp;Grunwald, <i>Maximum entropy and the glasses you are looking through</i>, Conference on Uncertainty in Artificial Intelligence (2000), 238-246.</dd>
<dt><a name="Halevy:2009p2327" id="Halevy:2009p2327">HNP09</a></dt>
<dd>Alon Halevy, Peter Norvig, and Fernando Pereira, <i>The unreasonable effectiveness of data</i>, IEEE Intellegent Systems (2009).</dd>
<dt><a name="NNCPE" id="NNCPE">Hus99</a></dt>
<dd>Dirk Husmeier, <i>Neural networks for conditional probability estimation</i>, Springer, 1999.</dd>
<dt><a name="Indyk:1999p166" id="Indyk:1999p166">IM99</a></dt>
<dd>Piotr Indyk and Rajeev Motwani, <i>Approximate nearest neighbors: Towards removing the curse of dimensionality</i>.</dd>
<dt><a name="Joachims:1998p406" id="Joachims:1998p406">Joa98</a></dt>
<dd>Thorsten Joachims, <i>Making large-scale svm learning practical</i>, Advances in Kernel Methods &#8211; Support Vector Learning (1998).</dd>
<dt><a name="Joachims:2006p403" id="Joachims:2006p403">Joa06</a></dt>
<dd>to3em, <i>Training linear svms in linear time</i>, KDD (2006).</dd>
<dt><a name="Karger:1998p556" id="Karger:1998p556">Kar98</a></dt>
<dd>David&nbsp;R Karger, <i>Randomization in graph optimization problems: A survey</i>, Optima: Mathematical Programming Society Newsletter <b>58</b> (1998).</dd>
<dt><a name="Kristjansson:2004p545" id="Kristjansson:2004p545">KCVM04</a></dt>
<dd>Trausti Kristjansson, Aron Culotta, Paul Viola, and Andrew&nbsp;Kachites McCallum, <i>Interactive information extraction with constrained conditional random fields</i>, AAAI (2004).</dd>
<dt><a name="Kleinberg:1997p32" id="Kleinberg:1997p32">Kle97</a></dt>
<dd>Jon&nbsp;M Kleinberg, <i>Authoritative souces in a hyperlinked environment</i>, ACM SIAM Symposium on Discrete Algorithms (1997).</dd>
<dt><a name="Komarek:2008p1742" id="Komarek:2008p1742">Kom08</a></dt>
<dd>Paul Komarek, <i>Logistic regression for data mining and high-dimensional classification</i>, CMU CS Thesis (2008), 138.</dd>
<dt><a name="Kivinen:1995p1836" id="Kivinen:1995p1836">KWA95</a></dt>
<dd>J&nbsp;Kivinen, Manfred&nbsp;K Warmuth, and P&nbsp;Auer, <i>The perceptron algorithm v.s. winnowo: Linear v.s. logarithmic mistake bounds when few input variables are relevant</i>, COLT (1995), 289-296.</dd>
<dt><a name="Lewis:1998p105" id="Lewis:1998p105">Lew98</a></dt>
<dd>David&nbsp;D Lewis, <i>Naive (bayes) at forty: The independence assumption in information retrieval</i>, find journal (1998).</dd>
<dt><a name="Lin:1973p2739" id="Lin:1973p2739">LK73</a></dt>
<dd>S&nbsp;Lin and BW&nbsp;Kernighan, <i>An effective heuristic algorithm for the traveling-salesman problem</i>, Operations Research (1973), 498-516.</dd>
<dt><a name="Maron:1961p2566" id="Maron:1961p2566">Mar61</a></dt>
<dd>M&nbsp;E Maron, <i>Automatic indexing: An experimental inquiry</i>, RAND Technical Report (1961), 404-417.</dd>
<dt><a name="HTSMH" id="HTSMH">MF00</a></dt>
<dd>Zbigniew Michalewicz and David&nbsp;B. Fogel, <i>How to solve it: Modern heuristics</i>, Springer, 2000.</dd>
<dt><a name="Mill" id="Mill">Mil02</a></dt>
<dd>John&nbsp;Stuart Mill, <i>A system of logic</i>, University Press of the Pacific, 2002.</dd>
<dt><a name="MitchellML" id="MitchellML">Mit97</a></dt>
<dd>Thomas Mitchell, <i>Machine learning</i>, McGraw-Hill, 1997.</dd>
<dt><a name="Maron:2000p2553" id="Maron:2000p2553">MK00</a></dt>
<dd>M&nbsp;E Maron and J&nbsp;L Kuhns, <i>On relevance, probabilistic indexing and information retrieval</i>, 1960 (2000), 1-29.</dd>
<dt><a name="Mount:2000p360" id="Mount:2000p360">Mou00</a></dt>
<dd>John&nbsp;A Mount, <i>Automatic detection of potential deadlock</i>, Dr. Dobbs Journal (2000).</dd>
<dt><a name="TradeArt" id="TradeArt">Mou09a</a></dt>
<dd>John Mount, <i>Automatic generation and testing of un-rolls for profitable technical trades</i>, <a href="http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/">http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/</a>, 2009.</dd>
<dt><a name="MLArt" id="MLArt">Mou09b</a></dt>
<dd>to3em, <i>A demonstration of data mining</i>, <a href="http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/">http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/</a>, 2009.</dd>
<dt><a name="Page:1998p2689" id="Page:1998p2689">PBMW98</a></dt>
<dd>Lawrence Page, Sergey Brin, Rajeev Motwani, and Tery Winograd, <i>The pagerank citation ranking: Bringing order to the web</i>, <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1768">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1768</a> (1998).</dd>
<dt><a name="Polya1" id="Polya1">Pol54a</a></dt>
<dd>G.&nbsp;Polya, <i>Induction and analogy in mathematics</i>, Princeton University Press, 1954.</dd>
<dt><a name="Polya2" id="Polya2">Pol54b</a></dt>
<dd>to3em, <i>Patterns of plausible inference</i>, Princeton University Press, 1954.</dd>
<dt><a name="citeulike:679515" id="citeulike:679515">Pol71</a></dt>
<dd>to3em, <i>How to solve it</i>, Princeton University Press, November 1971.</dd>
<dt><a name="Rall:1996p2473" id="Rall:1996p2473">RC96</a></dt>
<dd>Louis&nbsp;B Rall and George&nbsp;F Corliss, <i>An introduction to automatic differentiation</i>, SIAM: Computational Differentiation: Techniques, Applications and Tools (1996), 1-18.</dd>
<dt><a name="IndiscreteThoughts" id="IndiscreteThoughts">Rot97</a></dt>
<dd>Gian-Carlo Rota, <i>Indiscrete thoughts</i>, Birkhauser, 1997.</dd>
<dt><a name="Skilling:1988p780" id="Skilling:1988p780">Ski88</a></dt>
<dd>John Skilling, <i>The axioms of maximum entropy</i>, Maximum Entropy and Bayesian Methods in Science and Engineering <b>1</b> (1988), no.&nbsp;173-187.</dd>
<dt><a name="Sleator:1985p168" id="Sleator:1985p168">ST85</a></dt>
<dd>Daniel&nbsp;Dominic Sleator and Robert&nbsp;Endre Tarjan, <i>Amortized efficiency of list update and paging rules</i>, Communications of the ACM <b>28</b> (1985), no.&nbsp;2.</dd>
<dt><a name="SVMBook" id="SVMBook">STC00</a></dt>
<dd>Jown Shawe-Taylor and Nello Cristianini, <i>Support vector machines</i>, Cambridge University Press, 2000.</dd>
<dt><a name="KernBook" id="KernBook">STC04</a></dt>
<dd>to3em, <i>Kernel methods for pattern analysis</i>, Cambridge University Press, 2004.</dd>
<dt><a name="Strang" id="Strang">Str76</a></dt>
<dd>Gilbert Strang, <i>Linear algebra and its applications</i>, Academic Press, Inc., 1976.</dd>
<dt><a name="TibHat" id="TibHat">TH09</a></dt>
<dd>Jerome&nbsp;Friedman Trevor&nbsp;Hastie, Robert&nbsp;Tibshirani, <i>The elements of statistical learning: Data mining, inference and prediction</i>, Springer, 2009.</dd>
<dt><a name="Trevisan:2008p2166" id="Trevisan:2008p2166">TTV08</a></dt>
<dd>Luca Trevisan, Madhur Tulsiani, and Salil Vadhan, <i>Regularity, boosting, and efficiently simulating every high-entropy distribution</i>, Electronic Colloquium on Computational Complexity (2008), 18.</dd>
<dt><a name="Zeilberger:1995p277" id="Zeilberger:1995p277">Zei95</a></dt>
<dd>Doron Zeilberger, <i>The method of undetermined generalization and specialization illustrated with fred galvin&#8217;s amazing proof of the dinitz conjecture</i>, <a href="http://arxiv.org/abs/math/9506215">http://arxiv.org/abs/math/9506215</a>, 1995.</dd>
</dl>
<h1><a name="SECTION00070000000000000000" id="SECTION00070000000000000000">Acknowledgement</a></h1>
<p><font><font>A thank you to readers who supplied help and comments on earlier drafts.</font></font></p>
<p></p>
<hr />
<h4>Footnotes</h4>
<dl>
<dt><a name="foot21" id="foot21">&#8230; Mount</a><a href="#tex2html3"><sup>1</sup></a></dt>
<dd>email: <tt><a name="tex2html1" href="mailto:jmount@win-vector.com" id="tex2html1">mailto:jmount@win-vector.com</a></tt> web: <tt><a name="tex2html2" href="http://www.win-vector.com/" id="tex2html2">http://www.win-vector.com/</a></tt></dd>
<dt><a name="foot244" id="foot244">&#8230; principle.</a><a href="#tex2html4"><sup>2</sup></a></dt>
<dd>The pre-existing practice that comes cloesest to the local o global principle is found in operations research where encoding a problem to be solved by an optimizer is a central technique. We claim the natural statement of the local to global principle is more general than <font><em>always</em> encoding constraints for a particular optimizer (in particular globalization is not always optimization).</font></dd>
<dt><font><a name="foot43" id="foot43">&#8230; structure</a><a href="#tex2html6"><sup>4</sup></a></font></dt>
<dd><font>By &#8220;link structure&#8221; we mean which web pages link to which other web pages.</font></dd>
<dt><font><a name="foot45" id="foot45">&#8230; graph</a><a href="#tex2html7"><sup>5</sup></a></font></dt>
<dd><font>Remember, a graph is diagram consisting of nodes and edges (here depicted as arrows).</font></dd>
<dt><font><a name="foot245" id="foot245">&#8230; features</a><a href="#tex2html9"><sup>6</sup></a></font></dt>
<dd><font>For example the model could account for:</font></p>
<ul>
<li>surfers entering and leaving the model</li>
<li>link odds that vary where they are on a page</li>
<li>surfers staying on a page proportional to how much text is on the page</li>
<li>matching known traffic and click behavior where we have such data.</li>
</ul>
<p><font>For simplicity we will just stick with the example given example.</font></dd>
<dt><font><a name="foot154" id="foot154">&#8230; components.</a><a href="#tex2html17"><sup>7</sup></a></font></dt>
<dd><font>When a system is named and defined as an exact set of procedures the system can, by definition, not be improved. This is because with any change in procedure we have a new system that no longer matches the original definition and therefore requires a new name.</font></dd>
</dl>
<p><font><br /></font></p>
<hr />
<address><font>John Mount 2009-11-11</font></address>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/' rel='bookmark' title='Permanent Link: Automatic Differentiation with Scala'>Automatic Differentiation with Scala</a></li>
<li><a href='http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/' rel='bookmark' title='Permanent Link: A Demonstration of Data Mining'>A Demonstration of Data Mining</a></li>
<li><a href='http://www.win-vector.com/blog/2009/07/should-your-mom-use-google-search/' rel='bookmark' title='Permanent Link: Should your mom use Google search?'>Should your mom use Google search?</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2009/11/the-local-to-global-principle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Betting Best-Of Series</title>
		<link>http://www.win-vector.com/blog/2008/05/betting-best-of-series/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=betting-best-of-series</link>
		<comments>http://www.win-vector.com/blog/2008/05/betting-best-of-series/#comments</comments>
		<pubDate>Wed, 28 May 2008 01:23:04 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Expository Writing]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Quantitative Finance]]></category>
		<category><![CDATA[Dynamic Programming]]></category>
		<category><![CDATA[Technical Papers]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=18</guid>
		<description><![CDATA[Betting Best of Series is a new expository paper describing the mathematics involved in betting on something like the United States&#8217; Major League Baseball World Series. It isn&#8217;t so much about baseball as about demonstrating some of the really great ideas from mathematical finance in a simplified setting. This sort analysis is the &#8220;secret sauce&#8221; [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/' rel='bookmark' title='Permanent Link: What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?'>What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</a></li>
<li><a href='http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/' rel='bookmark' title='Permanent Link: Paper on stock trading'>Paper on stock trading</a></li>
<li><a href='http://www.win-vector.com/blog/2007/06/new-paper/' rel='bookmark' title='Permanent Link: New Paper'>New Paper</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.win-vector.com/dfiles/BestOf.pdf">Betting Best of Series</a> is a new expository paper describing the mathematics involved in betting on something like the United States&#8217; Major League Baseball World Series.  It isn&#8217;t so much about baseball as about demonstrating some of the really great ideas from mathematical finance in a simplified setting.  This sort analysis is the &#8220;secret sauce&#8221; in a lot of financial models and I trying to share the thrilling feeling of working with these techniques in an elementary essay (with diagrams).<span id="more-18"></span></p>
<p>Also in (less legible) HTML:</p>
<h1 align="center">Betting Best-Of Series</h1>
<p align="center"><strong>John Mount<a name="tex2html1" href="#foot16" id="tex2html1"><sup>1</sup></a></strong></p>
<p></p>
<p align="center"><b>Date:</b> May 27, 2008</p>
<hr />
<h1><a name="SECTION00010000000000000000" id="SECTION00010000000000000000">Introduction</a></h1>
<p>We use the United States&#8217; Major League Baseball World Series to demonstrate some of the &#8220;arbitrage arguments&#8221;<a name="tex2html2" href="#foot21" id="tex2html2"><sup>2</sup></a>used in mathematical finance. This problem is a classic finance puzzle question and is an interesting introduction to some exciting techniques.</p>
<p>&#8220;Arbitrage&#8221; is the simultaneous buying and selling of a commodity, usually in multiple markets, that returns a risk-free profit. An example would be finding a market where apples are selling for $1 and another where they are selling for $2, and then simultaneously executing a purchase order in the cheap market and a sales order in the expensive market (assuming no significant shipping risks or costs). Typically &#8220;arbitrage opportunities&#8221; are too much to hope for and to make a profit you must add value, loan money, hold inventory or take on risk. This is just the mathematical finance way of saying &#8220;there is no free lunch,&#8221; but a number of surprising facts about markets can be proven using this principle.</p>
<h1><a name="SECTION00020000000000000000" id="SECTION00020000000000000000">The Problem</a></h1>
<div align="center"><a name="fig:wsgames" id="fig:wsgames"></a><a name="27"></a></p>
<table>
<caption align="bottom"><strong>Figure 1:</strong> World Series Tree (Win over Loss)</caption>
<tr>
<td>
<div align="center"><img width="400" height="510" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./WorldSeries1.png" alt="Image WorldSeries1"/></div>
</td>
</tr>
</table>
</div>
<p>Consider a &#8220;first to win <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> &#8221; contest like the United States&#8217; Major League Baseball World Series. The World Series is a &#8220;first to win four&#8221; contest (sometimes called &#8220;best of seven&#8221;) where a number of games are played between two teams and the first team to win four games is declared the series winner. Ignoring the possibility of ties this process can take from four to seven games. We can (as in Figure&nbsp;<a href="#fig:wsgames">1</a>) lay out all of the possibilities in to a picture that moves from left to right and then moves up when the first team wins and down when the second team wins.</p>
<p>Any sequence of games is represented by a path through this diagram (starting at the left) that reaches a node with no exit. At each node we have marked in the wins for each team (Team One on top, Team Two on the bottom). The nodes where one team has won four games are where the series ends.</p>
<p>The &#8220;arbitrage question&#8221; is:</p>
<blockquote><p>If you had access to a bookie who was willing to take an even-payoff bet (on either side) in each game of the World Series, can you design a schedule of bets on games that simulates an even-payoff one dollar bet on the outcome of the entire World Series?</p></blockquote>
<p>That is: you wish to make a bet that pays you $1 if your team wins the World Series and costs you $1 if your team is defeated. You can not find anybody to take such a bet- but you have found a bookie who makes the incredibly generous offer of taking bets (at even pay-off) on each and every game in the series. Can you, without any additional risk, simulate a World Series bet by making a series of per-game bets with this bookie?</p>
<h1><a name="SECTION00030000000000000000" id="SECTION00030000000000000000">The Answer</a></h1>
<p>The answer turns out to be that you can simulate a world-series bet. The reason for hope is that both types of bets (the even-payoff bets on games and an even-payoff bet on the whole series) are expressing the same underlying belief: that both teams have an exactly equal chance of winning. The teams may or may not have the equal chances of winning- but offering to take bets on both sides at equal pay-off is equivalent expressing just such a belief.</p>
<p>The principle that the probability you are willing to take bets at expresses your subjective probabilities is a principle goes back to Bruno de Finetti and is the most basic &#8220;arbitrage style&#8221; argument. The principle is simple but it is useful warm-up to think about. Under the assumption that you are &#8220;rational&#8221; (in the economic sense, which just means you are not giving money away without a reason) and if <img width="24" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg2.png" alt="$ p_S$"/> denotes your personal estimate of the probability of your team winning then if you are willing to bet $1 that your team wins at even payoff (meaning you collect $1 if your team wins pay $1 if your team loses) then for this bet to make economic sense you must have:</p>
<div align="center"><!-- MATH<br />
 \begin{equation*}<br />
p_S ( +\$1 ) + (1-p_S) (- \$1) \ge 0<br />
\end{equation*}<br />
 --></p>
<table cellpadding="0" width="100%" align="center">
<tr valign="middle">
<td nowrap align="center"><img width="244" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg3.png" alt="$\displaystyle p_S ( +\$1 ) + (1-p_S) (- \$1) \ge 0$"/></td>
<td nowrap width="10" align="right">&nbsp;&nbsp;&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/><br />
which means <!-- MATH<br />
 $p_S\ge \frac{1}{2}$<br />
 --><br />
<img width="60" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg4.png" alt="$ p_S\ge \frac{1}{2}$"/> .</p>
<p>Similarly if you are willing (for purely economic reasons) to take the other side of the bet at the same even-payoff bet on the other side (reversing the rolls of winning and losing) then it must be true that <!-- MATH<br />
 $p_S \le \frac{1}{2}$<br />
 --><br />
<img width="60" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg5.png" alt="$ p_S \le \frac{1}{2}$"/> . We then have our conclusion: from an economic point of view you should be willing to take either side of a fair-payoff bet only if your estimate of the probability of winning is <img width="32" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg6.png" alt="$ 1/2$"/> .</p>
<div align="center"><a name="fig:wspartial" id="fig:wspartial"></a><a name="44"></a></p>
<table>
<caption align="bottom"><strong>Figure 2:</strong> World Series With Some Values Filled In</caption>
<tr>
<td>
<div align="center"><img width="400" height="510" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./WorldSeries2.png" alt="Image WorldSeries2"/></div>
</td>
</tr>
</table>
</div>
<p>We now return to the World Series diagram. If we bet on individual games (instead of making one bet on the whole series) then at each node in the diagram we expect to have some sort of net winnings or net losses. For example at each node where our team has won four games we should be holding <img width="23" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg7.png" alt="$ \$1$"/> , so we will label these nodes with <img width="28" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg8.png" alt="$ +1$"/> . Similarly at each node where the opposing team has won for games we expect to have lost exactly <img width="23" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg7.png" alt="$ \$1$"/> so we label those nodes with <img width="29" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg9.png" alt="$ -1$"/> . Our task is to figure out the amount bet at each node and our net holdings at each node. If we can find a schedule of bet amounts that leads to the correct outcomes at the end of the world series and starts with an initial net holdings of $0 then we have solved the problem.</p>
<p>If we look at Figure&nbsp;<a href="#fig:wspartial">2</a> we see there the node corresponding to each team having won 3 games points to two nodes we know the values of (the World Series ending with either team the winner). We can use the fact that this node points only to nodes with known net holdings to figure out both the bet that must be made at this node and the net holdings this node should have at this point in World Series.</p>
<p>Let <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg10.png" alt="$ x$"/> be the (unknown) net holdings we have at this node and <img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg11.png" alt="$ y$"/> be the (unknown) amount we bet then to complete the World Series bet we must have the following:</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
x + y &#038; = &#038; 1 \\<br />
 x - y &#038; = &#038; -1<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="48" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg12.png" alt="$\displaystyle x + y$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg14.png" alt="$\displaystyle 1$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="48" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg15.png" alt="$\displaystyle x - y$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="29" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg16.png" alt="$\displaystyle -1$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>This is enough to notice that <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg10.png" alt="$ x$"/> (your holdings) must be the average of the two outcomes pointed to and <img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg11.png" alt="$ y$"/> (your bet) must be one half of the difference of the two outcomes. So the &#8220;each team has won three games&#8221; node (near the very right end of the diagram) should have a net holding of <img width="59" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg17.png" alt="$ x = \$0$"/> and we should bet <img width="58" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg18.png" alt="$ y = \$1$"/> . Filling in this node with the net holdings ($0) now means that there are other nodes that point only to nodes with filled-in net holdings. We can, in fact, repeat this process of filling in each node with unknown net holdings with the average of the two known nodes it points to until we complete the diagram as in Figure&nbsp;<a href="#fig:wsfull">3</a>.</p>
<div align="center"><a name="fig:wsfull" id="fig:wsfull"></a><a name="55"></a></p>
<table>
<caption align="bottom"><strong>Figure 3:</strong> World Series All Values Filled In</caption>
<tr>
<td>
<div align="center"><img width="400" height="510" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./WorldSeries3.png" alt="Image WorldSeries3"/></div>
</td>
</tr>
</table>
</div>
<p>In the completed figure each node is filled in with the net holdings required to implement our betting schedule. We can see that a diagram like this can always be filled out to completion by looking at the diagram as having layers like an onion and noticing that we start with the right most nodes filled in (they are the nodes where the world series ends). It is obvious that we can fill out every node in the layer of nodes just inside the outer layer if we start at the right most such node and work back. Every layer can be completed one after another until we get to the inner most layer which is just the starting node. To implement the betting strategy, we keep track of where we are in the diagram and always bet one half of the difference between the net holdings of the two nodes pointed to by the node we are at.</p>
<p>If the first node of the diagram was marked with a value other than zero it would mean that the world-series has a net bias for the first team or the second. Since the rules are symmetric this would be a nonsense conclusion, so we can be sure that all of the even-score nodes must be valued at zero.</p>
<p>The filling in of blanks using values ahead of them (from the future) is the heart of the Binomial Pricing Theory for options is based on a very deep idea called Dynamic Programming. The idea is that you may not know which future you will experience- but you may know the valuation of every possible future. It is an amazing fact that even without introducing probabilities or probability estimates of which future you will experience just knowing the value of every possible future is enough to compute the value of a bet in the present time. In our example: you may not know ahead of time the final scores of the world series, but you do know value of a world series bet for each possible ending score.</p>
<h1><a name="SECTION00040000000000000000" id="SECTION00040000000000000000">What is the analogy?</a></h1>
<p>From a finance or betting point of view the problem is solved- we have procedures for building the betting schedule and we have the schedule itself. From a mathematician&#8217;s point of view we have only just started- we have some procedures and relations but what are they an analogy of?</p>
<p>Naively one might think that they should bet around one fourth of their desired outcome in each game to simulate a best of four series. However to simulate a total World Series bet of <img width="23" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg7.png" alt="$ \$1$"/> we use an initial bet of <!-- MATH<br />
 $\$5/16 = \$0.3125$<br />
 --><br />
<img width="138" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg19.png" alt="$ \$5/16 = \$0.3125$"/> in our schedule. This is almost a third of our desired total bet. This gets us wondering: what is the general form of this first bet?</p>
<p>Let <!-- MATH<br />
 $\text{bet}(k)$<br />
 --><br />
bet<img width="29" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg20.png" alt="$ (k)$"/> denote the amount of the first bet in the simulation of a &#8220;best of <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> &#8221; bet. If we compute <!-- MATH<br />
 $\text{bet}(k)$<br />
 --><br />
bet<img width="29" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg20.png" alt="$ (k)$"/> (by constructing betting schedules as above) for many values of <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> we see that <!-- MATH<br />
 $\text{bet}(k)$<br />
 --><br />
bet<img width="29" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg20.png" alt="$ (k)$"/> seems to shrink slower than <img width="39" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg21.png" alt="$ 1/k.$"/> In fact it seems to shrink at a rate of around <!-- MATH<br />
 $1/\sqrt{k}$<br />
 --><br />
<img width="49" height="44" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg22.png" alt="$ 1/\sqrt{k}$"/> . Even more intriguing if you plot <!-- MATH<br />
 $k/(\text{bet}(k)*\text{bet}(k))$<br />
 --><br />
<img width="31" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg23.png" alt="$ k/($"/>bet<img width="39" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg24.png" alt="$ (k)*$"/>bet<img width="37" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg25.png" alt="$ (k))$"/> it converges (very slowly) to <!-- MATH<br />
 $3.14 \cdots$<br />
 --><br />
<img width="66" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg26.png" alt="$ 3.14 \cdots$"/> . We can conjecture that for very large <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> the initial bet is: <!-- MATH<br />
 $1/\sqrt{\pi k}$<br />
 --><br />
<img width="61" height="44" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg27.png" alt="$ 1/\sqrt{\pi k}$"/> where <img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg28.png" alt="$ \pi$"/> is the famous ratio of the ratio of the length of the circumference of a circle to the the length of the diameter of the same circle.</p>
<p>Now <!-- MATH<br />
 $1/\sqrt{\pi k}$<br />
 --><br />
<img width="61" height="44" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg27.png" alt="$ 1/\sqrt{\pi k}$"/> is much larger that <img width="33" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg29.png" alt="$ 1/k$"/> (as <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> gets large). So the scheme says to bet a fairly large amount of your budget on the first game, and that winning the first bet is worth a bit more than you would expect (it takes you more than one <img width="15" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg1.png" alt="$ k$"/> th of the way to victory).</p>
<div align="center"><a name="fig:wsWeightedPaths" id="fig:wsWeightedPaths"></a><a name="71"></a></p>
<table>
<caption align="bottom"><strong>Figure 4:</strong> Weighted Paths</caption>
<tr>
<td>
<div align="center"><img width="400" height="510" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./WorldSeries4.png" alt="Image WorldSeries4"/></div>
</td>
</tr>
</table>
</div>
<p>What is going on? We can again apply an arbitrage or de Finetti style argument and say since the whole game was &#8220;fair&#8221; with expected pay-off zero then we can relate probabilities and payoffs. The net holdings at each node encode how much of an advantage you have at the node (or how much you should pay to take over from another gambler at this point). If we let <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg30.png" alt="$ p_1$"/> denote the probability of going on to win the World Series bet after winning the first bet then we must have:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
p_1 (\$1) + (1-p_1) (-\$1) = \text{bet}(k) .<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="210" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg31.png" alt="$\displaystyle p_1 (\$1) + (1-p_1) (-\$1) =$"/>&nbsp; &nbsp;bet<img width="34" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg32.png" alt="$\displaystyle (k) . $"/></div>
<p>Or <!-- MATH<br />
 $p_1 = (\text{bet}(k) + 1)/2$<br />
 --><br />
<img width="54" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg33.png" alt="$ p_1 = ($"/>bet<img width="88" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg34.png" alt="$ (k) + 1)/2$"/> . For the real World Series we had <!-- MATH<br />
 $\text{bet}(4)=5/16$<br />
 --><br />
bet<img width="91" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg35.png" alt="$ (4)=5/16$"/> so <!-- MATH<br />
 $p_1 = 21/32$<br />
 --><br />
<img width="93" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg36.png" alt="$ p_1 = 21/32$"/> . This means we can read-off from the valuation tree that the probability of winning the World Series (for perfectly equally matched teams) rise from <img width="32" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg6.png" alt="$ 1/2$"/> to <img width="51" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg37.png" alt="$ 21/32$"/> after you win the first game.<a name="tex2html7" href="#foot77" id="tex2html7"><sup>3</sup></a> This can be confirmed from Figure&nbsp;<a href="#fig:wsfull">3</a>. It is easy to confirm that a <img width="51" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg37.png" alt="$ 21/32$"/> portion of all paths the node where Team One has one the first game end with Team One winning the whole World Series (each path must be weighted by its probability which are <!-- MATH<br />
 $2^{-path<br />
length}$<br />
 --><br />
<img width="90" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg38.png" alt="$ 2^{-path length}$"/> ). Instead of computing the bets we could have computed the probability of going on to win the World Series at each node<a name="tex2html8" href="#foot80" id="tex2html8"><sup>4</sup></a> (and then used the above equivalence principle to read off the required bets).</p>
<p>We can create a new diagram where we start at the node where our team has won the first game and we label all the non-ending nodes with the number of paths that reach the node. For example the two nodes immediately after start can be reach one way each and the next three nodes (&#8220;3 games to 0&#8221;, &#8220;2 games to 1&#8221; and &#8220;1 games to 2&#8221;) can be reached 1,2 and 1 ways respectively. It is a clever trick to notice that the easiest way to count the number of paths to a node is to just add the number of ways found on the previous nodes that point to the our target node. This clever way of counting paths is to use weighted paths (inspired by something called Pascal&#8217;s Triangle). Figure&nbsp;<a href="#fig:wsWeightedPaths">4</a> shows a few columns of a weighted path diagram (thought he ending nodes are re-written as the sum of the paths reaching them where every path is divided by <!-- MATH<br />
 $2^{-\text{path length}}$<br />
 --><br />
<img width="93" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg39.png" alt="$ 2^{-\text{path length}}$"/> which is the probability of following such a path).</p>
<p>The entries of weighted path diagram are identified by how many columns out from the start node they are and how many steps from one side of the row they are. Both identifiers start at zero so the starting node is denoted as <!-- MATH<br />
 ${0 \choose 0}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg40.png" alt="$ {0 \choose 0}$"/> the two nodes just after them are denoted <!-- MATH<br />
 ${1 \choose 0}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg41.png" alt="$ {1 \choose 0}$"/> and <!-- MATH<br />
 ${1 \choose 1}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg42.png" alt="$ {1 \choose 1}$"/> . The three nodes just after these are denoted <!-- MATH<br />
 ${2 \choose 0}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg43.png" alt="$ {2 \choose 0}$"/> , <!-- MATH<br />
 ${2 \choose 1}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg44.png" alt="$ {2 \choose 1}$"/> , <!-- MATH<br />
 ${2 \choose 2}$<br />
 --><br />
<img width="29" height="42" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg45.png" alt="$ {2 \choose 2}$"/> and are (as we said before) equal to 1,2 and 1 respectively. These entries are called &#8220;binomial coefficients&#8221; and the rules for computing them (for integers <img width="31" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg46.png" alt="$ a,b$"/> ) are as follows:</p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
{a \choose b} &#038; = &#038; 0 \;\text{if $a&lt;0$\  or $b&lt;0$\  or $b>a$} \\<br />
{a \choose 0} &#038; = &#038; 1 \;\text{if $a>=0$} \\<br />
{a \choose a} &#038; = &#038; 1 \;\text{if $a>=0$} \\<br />
{a \choose b} &#038; = &#038; {a-1 \choose b-1} + {a-1 \choose b} \;\text{otherwise.}<br />
\end{eqnarray*}<br />
 &#8211;></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="42" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg47.png" alt="$\displaystyle {a \choose b}$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg48.png" alt="$\displaystyle 0 \;$"/>if <img width="49" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg49.png" alt="$ a&lt;0$"/> or <img width="47" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg50.png" alt="$ b&lt;0$"/> or <img width="47" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg51.png" alt="$ b&gt;a$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="42" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg52.png" alt="$\displaystyle {a \choose 0}$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg53.png" alt="$\displaystyle 1 \;$"/>if <img width="63" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg54.png" alt="$ a&gt;=0$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="42" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg55.png" alt="$\displaystyle {a \choose a}$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg53.png" alt="$\displaystyle 1 \;$"/>if <img width="63" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg54.png" alt="$ a&gt;=0$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="42" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg47.png" alt="$\displaystyle {a \choose b}$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg13.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="174" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg56.png" alt="$\displaystyle {a-1 \choose b-1} + {a-1 \choose b} \;$"/>otherwise.</td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>From our diagram we see that the probability of winning the World Series bet is a diagonal sum across Pascal&#8217;s Triangle (weighted by powers of 2). To somebody trained in combinatorics it is obvious<a name="tex2html9" href="#foot101" id="tex2html9"><sup>5</sup></a> that a sum like this must itself be a single binomial coefficient. A quick trip to &#8220;The On-Line Encyclopedia of Integer Sequences&#8221; is enough to identify the solution (Encyclopedia sequence &#8220;A001700&#8221;) and we can get an exact form for initial bet:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\text{bet}(k) =  { 2 k - 3 \choose k - 1} 2^{-(2 n - 3)} .<br />
\end{displaymath}<br />
 --></p>
<div align="center">&nbsp; &nbsp;bet<img width="204" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg57.png" alt="$\displaystyle (k) = { 2 k - 3 \choose k - 1} 2^{-(2 n - 3)} . $"/></div>
<p>A lot is known about Binomial coefficients. In fact by a formal called &#8220;Stirling&#8217;s approximation&#8221; we know</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
{ 2 k - 3 \choose k - 1} 2^{-(2 n - 3)} \approx \frac{1}{\sqrt{\pi k}}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="215" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg58.png" alt="$\displaystyle { 2 k - 3 \choose k - 1} 2^{-(2 n - 3)} \approx \frac{1}{\sqrt{\pi k}} $"/></div>
<p>as observed.</p>
<h1><a name="SECTION00050000000000000000" id="SECTION00050000000000000000">Relations</a></h1>
<p>de Finetti used this style of reasoning to provide a foundation for the basic theory of probability. Probability theory has always been somewhat problematic for mathematicians in that it has &#8220;content&#8221; or &#8220;an interpretation&#8221; whereas the power of modern mathematics comes from a more axiomatic or content-free way of thinking. The issue is if you are defining the meaning or interpretation of something like probability how do you check or demonstrate that you have the correct meaning without referring to some other pre-existing interpretation? A foundational or first interpretation has trouble looking for prior definitions to show equivalence to.[<a href="#Shafer:2002p1513">6</a>]</p>
<p>The arbitrage-free arguments and the binomial arguments in particular are the basis of much of mathematical finance and are the basis for a number of Nobel Prizes in Economics including the Black-Scholes-Merton Option Pricing Model[<a href="#Black:1973p1502">2</a>] and the Binomial Option Pricing Model.[<a href="#Cox:1979p1505">5</a>]</p>
<p><img width="16" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/betimg28.png" alt="$ \pi$"/> (the ratio of the circumference of a circle to its diameter) is one of the most famous constants in mathematics. Pascal&#8217;s Triangle is one of the oldest and most studied diagrams in mathematics with roots all the way back into ancient China.[<a href="#OstermanCoulter:2003p1034">4</a>] It is actually remarkable how much Zhu Shijie 1303 diagram: <img width="200" height="312" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Yanghui_triangle.png" alt="Image Yanghui_triangle"/> looks like our modern version of Pascal&#8217;s Triangle (though they are separated by about 350 years, source Wikipedia): <img width="200" height="102" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Triangle.png" alt="Image Triangle"/>. The two diagram differ only in the notation used to write numbers and both start by filling in two diagonals of ones and all other numbers are the sums of the two numbers nearest and above them.</p>
<p>The arguments that replace paths with counts are a particular example of a technique called &#8220;Dynamic Programming&#8221; invented by Richard Bellman for mathematical optimization and now one of the core concepts of algorithm design.[<a href="#dynamicProgramming">1</a>]</p>
<p>The idea of using a set of unknown futures that each have a known value is the key idea in solving a number of hard problems in probability and in optimization in the face of uncertainty. One of the the most famous of these problems is the &#8220;two armed bandit&#8221; where one must decide how to split ones bets between two slot machines that are thought to pay-off at different rates.[<a href="#Chernoff:1959p1444">3</a>]</p>
<p>For the two armed bandit problem the concern is how long to experiment with both machines when one machine seems to be paying more. The correct solution depends on seeing that how certain you need to be on the difference in machine vales (which in turn drives how long you experiment on both machines). This is a function of how long you intend to use the information. If you intend to play for a long time you want a long initial research phase to produce a very high confidence ranking of the machines; if you do not intend to play for long you want to switch to the machine you suspect is better sooner and on less evidence. Of course &#8220;slot machines&#8221; is just a toy-problem standing in for uncertain investments, research spending or even spending on different only advertising phrases.</p>
<h1><a name="SECTION00060000000000000000" id="SECTION00060000000000000000">Conclusions</a></h1>
<p>The finance &#8220;no arbitrage&#8221; principle is actually a very powerful mathematical tool. It is equivalent to but somewhat more graceful than introducing probabilities when solving some combinatorial problems. In this setting it is equivalent to de Finetti&#8217;s principle and converting between probabilities and net holdings is very easy.</p>
<h2><a name="SECTION00070000000000000000" id="SECTION00070000000000000000">Bibliography</a></h2>
<dl compact>
<dt><a name="dynamicProgramming" id="dynamicProgramming">1</a></dt>
<dd>B<small>ELLMAN,</small> R.<br />
<em>Dynamic Programming</em>.<br />
Dover Publications, 2003.</dd>
<dt><a name="Black:1973p1502" id="Black:1973p1502">2</a></dt>
<dd>B<small>LACK,</small> F., <small>AND</small> S<small>CHOLES,</small> M.<br />
The pricing of options and corporate liabilities.<br />
<em>The Journal of Political Economy 81</em>, 3 (Jun 1973), 637-654.</dd>
<dt><a name="Chernoff:1959p1444" id="Chernoff:1959p1444">3</a></dt>
<dd>C<small>HERNOFF,</small> H.<br />
Sequential design of experiments.<br />
<em>Ann. Math. Statist. 30</em>, 3 (Feb 1959), 755-770.</dd>
<dt><a name="OstermanCoulter:2003p1034" id="OstermanCoulter:2003p1034">4</a></dt>
<dd>C<small>OULTER,</small> L.&nbsp;O.<br />
What is mathematics? toward a global view.<br />
17.</dd>
<dt><a name="Cox:1979p1505" id="Cox:1979p1505">5</a></dt>
<dd>C<small>OX,</small> J.&nbsp;C., R<small>OSS,</small> S.&nbsp;A., <small>AND</small> R<small>UBINSTEIN,</small> M.<br />
Option pricing: A simplified approach.<br />
<em>Journal of Financial Economics</em> (Sep 1979), 39.</dd>
<dt><a name="Shafer:2002p1513" id="Shafer:2002p1513">6</a></dt>
<dd>S<small>HAFER,</small> G., G<small>ILLETT,</small> P.&nbsp;R., <small>AND</small> S<small>CHERL,</small> R.&nbsp;B.<br />
A new understanding of subjective probability and its generalization to lower and upper prevision.<br />
<em>Game-Theoretic Probability Project</em> (Oct 2002), 62.</dd>
</dl>
<p></p>
<hr />
<h4>Footnotes</h4>
<dl>
<dt><a name="foot16" id="foot16">&#8230; Mount</a><a href="#tex2html1"><sup>1</sup></a></dt>
<dd>http://www.win-vector.com/</dd>
<dt><a name="foot21" id="foot21">&#8230; arguments&#8221;</a><a href="#tex2html2"><sup>2</sup></a></dt>
<dd>More pedantically we are using the principle of &#8220;no arbitrage&#8221; or &#8220;arbitrage free&#8221; argument, but the name is traditional.</dd>
<dt><a name="foot77" id="foot77">&#8230; game.</a><a href="#tex2html7"><sup>3</sup></a></dt>
<dd>Again, this if for the unrealistic situation of perfectly matched teams. For teams that have uneven probability the series strongly amplifies the better team&#8217;s chance of winning (which is one of the series intents). Also a better could update his subjective probability based on the first outcome which also changes things.</dd>
<dt><a name="foot80" id="foot80">&#8230; node</a><a href="#tex2html8"><sup>4</sup></a></dt>
<dd>This calculation is in essence summing end outcomes across all possible paths weighted by how likely each path is. There are many possible paths, but the calculation can be performed quite efficiently.</dd>
<dt><a name="foot101" id="foot101">&#8230; obvious</a><a href="#tex2html9"><sup>5</sup></a></dt>
<dd>&#8220;Obvious&#8221; is actually a special term in mathematics. To illustrate what it means we repeat a story. A mathematician was giving a lecture and stated that the point just shown was obvious. A student asked if it was really obvious. The mathematician stopped the lecture and paused to think. The mathematician thought some more, and eventually walked out of the room. Forty minutes later the mathematician returned to the lecture hall and informed the student that the last point was indeed obvious.</dd>
</dl>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/' rel='bookmark' title='Permanent Link: What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?'>What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</a></li>
<li><a href='http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/' rel='bookmark' title='Permanent Link: Paper on stock trading'>Paper on stock trading</a></li>
<li><a href='http://www.win-vector.com/blog/2007/06/new-paper/' rel='bookmark' title='Permanent Link: New Paper'>New Paper</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2008/05/betting-best-of-series/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper on stock trading</title>
		<link>http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=paper-on-stock-trading</link>
		<comments>http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/#comments</comments>
		<pubDate>Thu, 04 Oct 2007 02:03:33 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Quantitative Finance]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Dynamic Programming]]></category>
		<category><![CDATA[Stock Trading]]></category>
		<category><![CDATA[Technical Papers]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/2007/10/03/paper-on-stock-trading/</guid>
		<description><![CDATA[author: John Mount I have finally written up and released a paper in PDF: Automatic Generation and Testing of Trades describing a lot of the statistics and optimization methods used when I was technical trading on a Banc of America Securities proprietary program trading desk.  It was a very exciting time. I have also included [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/03/what-does-the-market-think/' rel='bookmark' title='Permanent Link: What does the Market Think?'>What does the Market Think?</a></li>
<li><a href='http://www.win-vector.com/blog/2007/06/new-paper/' rel='bookmark' title='Permanent Link: New Paper'>New Paper</a></li>
<li><a href='http://www.win-vector.com/blog/2009/09/a-discrete-model-gauging-market-efficiency/' rel='bookmark' title='Permanent Link: A Discrete Model Gauging Market Efficiency'>A Discrete Model Gauging Market Efficiency</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>author: John Mount</p>
<p>I have finally written up and released a paper in  PDF: <a href="http://www.win-vector.com/SelectedPapers/files/AutomaticGenerationAndTestingOfTrades.pdf">Automatic Generation and Testing of Trades</a> describing a lot of the statistics and optimization methods used when I was technical trading on a Banc of America Securities proprietary program trading desk.  It was a very exciting time. </p>
<p><span id="more-5"></span><br />
I have also included a less legible HTML version:</p>
<h1 align="center">Automatic Generation and Testing of <em>Un-Rolls</em> for Profitable Technical Trades</h1>
<p align="center"><strong>John Mount<a name="tex2html1" href="#foot10" id="tex2html1"><sup>1</sup></a></strong></p>
<p></p>
<p align="center"><b>Date:</b> September 9, 2007</p>
<hr />
<h1><a name="SECTION00010000000000000000" id="SECTION00010000000000000000">Introduction</a></h1>
<p>In this paper we discuss some of the basic steps in developing successful technical trading strategies. The method involves identifying an inefficiency or irregularity in the market and then using rigorous statistical methods to track and exploit this single feature of the market. We show how to automatically generate and test optimal <em>un-rolls</em> or trades that undo (at a profit) automatically triggered technical trades. That is to say, if the first half of technical trade is specified we show how to find the other half.</p>
<p>Our technique is to use standard tools, such as kernel methods[<a href="#nonparametricStatistics">8</a>] and Markov chains[<a href="#markovChains">4</a>], to model both the efficient and the inefficient portions of the US stock markets.[<a href="#investments">6</a>]</p>
<p>The author traded profitably using some of these techniques while part of a program trading desk at Banc of America Securities.</p>
<h1><a name="SECTION00020000000000000000" id="SECTION00020000000000000000">Technical Trading</a></h1>
<p>Technical trading is a popular universe of security-trading strategies that trade using only the so-called <em>technical data</em> which are price graphs, volumes, bid/ask books and other data commonly available in market feeds.<a name="tex2html2" href="#foot21" id="tex2html2"><sup>2</sup></a>Input sources can also include external triggers based on news, RSS feeds, on-line information and corporate announcements.<a name="tex2html3" href="#foot22" id="tex2html3"><sup>3</sup></a>These strategies are very attractive in that that are quantifiable, easy to implement and easy to back-test on historic data. A major weakness of technical trading strategies is that they ignore deeper knowledge or analysis of the companies that are behind the securities being traded. Systems of technical trading are used both by large sophisticated hedge funds and by a varying population of day-traders.</p>
<p>Typical technical variables include price, time, volume and moving averages. It is important to know that many of these variables are really just analogies and not essential features of the market. For example: none of the variables current price, time, velocity, acceleration or inertia are real market quantities. What is traditionally called <em>current price</em> is actually the price of the last trade, which is in the past and may or may not ever be seen again. The fundamental variables of state of US stock markets are bid (best purchase price and quantity currently offered), ask (best sale price and quantity currently offered), and last trade (price and quantity). Each change of these variables is called a <em>tick</em> and can happen at any time. More detailed views include detailed bid and ask books from multiple market participants and estimates of inventory imbalance of various market makers and specialists.</p>
<p>In addition to working with the proper variables a sound strategy must also have at least two important components that we call foundation and empirical correctness. Without these components there is a large danger self-delusion and an unreliable strategy.</p>
<p>By <em>foundation</em> we mean that there are <em>a priori</em> reasons to believe that some variation of the strategy should be profitable. By ignoring the nature of the companies underling the securities being traded technical trading starts on shaky ground. In fact it is tempting to appeal to an <em>efficient market</em> hypothesis and claim that no technical trading strategy should be profitable. In some sense this is true- trades made in true ignorance expose a trader to significant risk, trading costs and pointless payment of the so-called bid-ask gap. Founded technical trading strategies are based on violations of the efficient market hypothesis- identifying situations where the market is in fact not efficient and trading into these situations. If there is no reason to suspect a market inefficiency there really is no reason to perform a technical trade. Testing numerous un-founded trading strategies is more likely to discover irrelevant anomalies in past data or discover flaws in one&#8217;s statistical procedures than it is likely to discover new valuable trading rules.[<a href="#Ioannids:2005aa">3</a>]</p>
<p>Possible market irregularities include (but are not limited to):</p>
<ul>
<li>Market Open</li>
<li>External News</li>
<li>Earnings Reports</li>
<li>M&amp;A news</li>
<li>Unusual Volume</li>
<li>Inferred state of Market Maker / Specialist state</li>
<li>Detailed Bid/Ask book.</li>
</ul>
<p>By <em>empirical correctness</em> we mean that strategy can be validated and proven on historic market data. A technical strategy can have as much mathematical pedigree as you like, but it does not make sense if it can not be mechanically implemented and proven on historic data. Many technical features are popular due to their familiarity or the quality of graphs they produce- but the true measure is how well strategies generate specific executable actions and the quantified outcomes of those actions.</p>
<p>Given an irregularity it remains to develop the trading strategy. Typically this involves an initial trade (a buy or a sell) triggered by evidence of the irregularity/inefficiency followed somewhat later by a reversal or un-rolling of the trade (selling back against an initial buy or buying back against an initial sell). If markets were perfectly efficient and instantaneous in incorporating external events this should not work- so it is important to test that there really is a repeatable market inefficacy.</p>
<p>Possible initial trading strategies could include:</p>
<ul>
<li>Selling stock into an unusual price spike (a <em>contrarian</em> strategy).</li>
<li>Buying stock immediately on news (a <em>superior connection</em> strategy).</li>
<li>Selling stock into a perceived specialist imbalance (a <em>superior knowledge</em> strategy).</li>
</ul>
<p>It would be naive to expect that a strategy that starts on a trigger and then reverses its trade blindly (say some fixed time after the trigger) is fully efficient. We must assume that other players in the market have seen effects of the trigger we traded and that their actions introduce biases and uncertainty into the market. Modeling these effects will allow us to produce a systematic <em>unrolling</em> strategy that can complete any <em>entry strategy</em> into a complete round-trip system. This systematic unrolling strategy is the subject of this writeup.</p>
<h1><a name="SECTION00030000000000000000" id="SECTION00030000000000000000">First Model</a></h1>
<h2><a name="SECTION00031000000000000000" id="SECTION00031000000000000000">The Efficient Market Hypothesis</a></h2>
<p>The efficient market hypothesis is a useful tool, even when you are attempting to find inefficient market situations. It represents the baseline you feel you have found a useful deviation from. The efficient market hypothesis has many variants but the essential content is that the market is full of <em>informed players</em> so any information is <em>already factored in to the price</em>. For example if there is publicly available information that gives a reasonable expectation that a stock should rise in the future then informed investors would purchase the stock early to be in a position to benefit from this increase. These purchases actually cause their own price-increase (by the simple laws of supply and demand) and have the effect of reducing the value of the information- as they move the price increase back in time (from the expected future change in value to the time of the anticipatory buying). This is what is meant by the phrase &#8220;already factored in.&#8221;</p>
<div align="center"><a name="fig:actual" id="fig:actual"></a><a name="47"></a></p>
<table>
<caption align="bottom"><strong>Figure 1:</strong> Dell 10-13-2006 Tick Data.</caption>
<tr>
<td>
<div align="center"><img width="500" height="402" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./morning.png" alt="Image morning"/></div>
</td>
</tr>
</table>
</div>
<p>There is a mathematical concept that captures the idea of <em>already factored in</em>: Martingales. The Martingale condition is a concept that says the expected future value is the current value. For example betting a dollar on the flip of a fair coin is a Martingale of value <img width="23" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg1.png" alt="$ \$0$"/> (the odds of winning and losing a dollar balance out). The future value may be higher or lower- but when the Martingale condition is met the average of all these value weighted by their likelihood of occurrence is equal to the current value. The <em>already factored in</em> example mentioned above shows how the many players in the market tend to establish a near-Martingale by trading in such a way to move the current price to be the expected value of the future price.</p>
<div align="center"><a name="fig:random" id="fig:random"></a><a name="198"></a></p>
<table>
<caption align="bottom"><strong>Figure 2:</strong> Graph of a <em>market-like</em> random walk.</caption>
<tr>
<td>
<div align="center"><img width="500" height="500" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./random.png" alt="Image random"/></div>
</td>
</tr>
</table>
</div>
<p>If market prices were the sum of many individual traders each with bounded-budgets who traded independently then we could apply the <em>central limit theorem</em> or <em>law of large numbers</em> and say that the market is indeed a random walk like the famous Brownian motion from physics. In fact on first inspection the market price histories (as in Figure&nbsp;<a href="#fig:actual">1</a>) indeed look very similar to graphs generated by such a random process (as in Figure&nbsp;<a href="#fig:random">2</a>).</p>
<p>As we have said: it is no coincidence that the market looks nearly like a Brownian motion. Informed trading effects tend to impart Martingale like tendencies (once the overall increase factor of the value of holding wealth is factored out). Also, if the variance of the market were much larger than that of a similar Brownian motion this would itself attract <em>channel traders</em> who would benefit by trading in and out of the excess wiggling. The point is that an efficient market is usually pretty well described by random processes that have the Martingale property (like Brownian motions or Markov chains), so these are appropriate modeling tools.</p>
<p>If the market process really were such a random walk than there would be little point in technical trading. The whole theory of Martingales was developed to precisely describe situations where bets based on collecting historic information can not work. This is often called the <em>no gambling system</em> principle and it can be actually proven for systems like Martingales, unbiased Markov chains, drift-free Brownian motion and was even used as an foundational concept to define randomness by von Mises.[<a href="#vonMises">7</a>] However, traders have a large number of pervasive dependencies. Dependencies can be shared information, <em>herd mentality</em> or shared trading practices. There are also some traders with very large budgets, so the conditions commonly needed to apply the law of large numbers do not apply and it is not inevitable that the market is indeed a Brownian motion. In fact one can show that even though the market overall looks very much like a Brownian motion it has too many events that would be considered very rare in this model (crashes, run-ups, events correlated in time) to have plausibly been generated by such a model.</p>
<h2><a name="SECTION00032000000000000000" id="SECTION00032000000000000000">Exploiting Inefficiency</a></h2>
<p>A basic rule of thumb is: without a good reason to believe contrary you are not too far off assuming the market is efficient. So we decided to model the morning market as being nearly memoryless. That is we modeled it as if future prices depend only on the most recent price and not on the detailed history of prices. We will, however, condition the model on the bias introduced by the presence initial trade trigger.</p>
<p>The most basic memoryless model is the Markov Chain. In this model the world has finite number of situations called <em>states</em>. For example we could say the stock price being near each a number of price differences from the previous day&#8217;s close is a state. We could take our states to be: <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg2.png" alt="$ +0.50\%$"/> , <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg2.png" alt="$ +0.50\%$"/> , <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg3.png" alt="$ +0.25\%$"/> , <img width="53" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg4.png" alt="$ 0.00\%$"/> , <img width="53" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg5.png" alt="$ 0.25\%$"/> , <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg6.png" alt="$ -0.50\%$"/> . If our strategy involved an end of day sale followed by a next-day buy-back then knowing which state we are in allows us to assign a value to buying back the stock while in that state. This would be the negative of the relative change in stock price (price decreases work for us) times the value of the stock sold the day before (minus trading costs). If we modeled round-trip trading costs as <img width="56" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg7.png" alt="$ \$20.00$"/> and assume our triggered trade purchased a total value of <img width="69" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg8.png" alt="$ \$46,000$"/> of Dell then we could map buying back in each possible state to a net dollar value of the round trip. For instance buying back in the state <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg2.png" alt="$ +0.50\%$"/> would represent a net-loss of <img width="42" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg9.png" alt="$ \$250$"/> . We actually want to make the states a bit more detailed by adding a notion of time. If we modeled time in 5-minute intervals and (for the sake of diagram clarity) assumed that we only move up or down one state-level the Markov that modeled the first 15 minutes of the market could be represented in a diagram as in Figure&nbsp;<a href="#fig:chain1">3</a>.</p>
<div align="center"><a name="fig:chain1" id="fig:chain1"></a><a name="74"></a></p>
<table>
<caption align="bottom"><strong>Figure 3:</strong> Markov Chain Model</caption>
<tr>
<td>
<div align="center"><img width="505" height="400" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Chain1.png" alt="Image Chain1"/></div>
</td>
</tr>
</table>
</div>
<p>Each circle represents a state and each arrow represents a transition from state to state. We would use historic market data to find for every stock in this situation the relative frequency each transition is taken. For instance we would measure in our historic data what fraction of the time a stock that is 5 minutes and in the <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg3.png" alt="$ +0.25\%$"/> state moves to the <img width="68" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg2.png" alt="$ +0.50\%$"/> state at the 10 minute mark. These learned state transition probabilities can be made to depend on factors from the previous day close (% increase, volume, market-capitalization) . In the diagram we are going to assume all transitions are equally likely except for the arrows with square bases which we each take to be twice as likely as each regular arrow leaving the same state. The success of our strategy depends on finding situations where our model predicts these sort of advantageous asymmetric conditions. Without these asymmetries (greater net propensity for price decrease than for price increase) we would be in a gambling situation where no strategy could possibly have net-positive value.</p>
<p>The diagram also encodes another assumption of the problem- we have a deadline for buying back the stock. In this case the diagram indicates a forced buy-back at time +15 minutes if a buy-back has not been made before that time. In reality many more levels and many more time intervals are modeled. Also note we have made the top row (representing maximal loss) absorbing. This is introducing a deliberate pessimistic flaw into the model (or equivalently adds a stop-loss condition to the strategy). We do not want the maximal loss states to have a reflected barrier (like the maximal profit states do) as this would make the model overly optimistic. Instead we force the model to be pessimistic and chose enough levels so that the maximum loss bound is not often achieved and therefor does not have large effect on the model.</p>
<div align="center"><a name="fig:chain2" id="fig:chain2"></a><a name="81"></a></p>
<table>
<caption align="bottom"><strong>Figure 4:</strong> Valuing Interior States</caption>
<tr>
<td>
<div align="center"><img width="505" height="400" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Chain2.png" alt="Image Chain2"/></div>
</td>
</tr>
</table>
</div>
<p>What we want to know is the net-value of being short (having sold) the stock the evening before. This is represented by the left-most circle which does not yet have a known value. The value of this state depends both on the transition odds of the states and on the trading strategy used to buy back the stock. There is, for example, no value in reaching the price-drop states if our strategy doesn&#8217;t take advantage and buy back while in these states. So the value of the states depends both on the uncertain future behavior of the market and of the currently unspecified buy-back strategy. The neat thing about this sort of diagram and treatment is that the forced-liquidation states at the end make it possible to simultaneously find the optimal trading strategy and assign values to all of the states. For example in the next diagram we see that the value of allowing the middle state at +10 minutes <em>to ride</em> (i.e. waiting instead of buying the stock back at this time) is equal to the properly weighted average of the ending states it connects to, in this case: <!-- MATH<br />
 $\frac{1}{4}(-\$135) + \frac{1}{4}(-\$20) + \frac{1}{2} \$95 = \$8.75$<br />
 --><br />
<img width="302" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg10.png" alt="$ \frac{1}{4}(-\$135) + \frac{1}{4}(-\$20) + \frac{1}{2} \$95 = \$8.75$"/> . The value of buying-back in this states is <img width="47" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg11.png" alt="$ -\$20$"/> so the optimal strategy is to take our chances in the next time interval (see Figure&nbsp;<a href="#fig:chain2">4</a>).</p>
<p>We can repeat this sort of argument for each state in the second to last column and determine the net-value of each state under the optimal trading strategy. States whose optimal strategy is to <em>stop</em> (perform the buy-back immediately) are indicated by not having any outgoing arrows (see Figure&nbsp;<a href="#fig:chain2b">5</a>).</p>
<div align="center"><a name="fig:chain2b" id="fig:chain2b"></a><a name="98"></a></p>
<table>
<caption align="bottom"><strong>Figure 5:</strong> Propagating the Valuation</caption>
<tr>
<td>
<div align="center"><img width="505" height="400" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Chain2b.png" alt="Image Chain2b"/></div>
</td>
</tr>
</table>
</div>
<p>The procedure moves from right to left using known states to fill in decisions and values for unknown states. In fact the calculation is so simple and orderly we can encode the entire filling-in procedure in a spreadsheet table:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\begin{array}{|l|lllr|}<br />
\hline<br />
 &#038; \text{column A} &#038; \text{column B} &#038; \text{column C} &#038; \text{column D} \\<br />
\hline<br />
\text{row 1} &#038; =D1 &#038; =D1 &#038; =D1 &#038; -\$250 \\<br />
\text{row 2} &#038; =\max(D2,(B1+B2+B3)/3) &#038; =\max(D2,(C1+C2+C3)/3) &#038; =\max(D2,(D1+D2+D3)/3) &#038; -\$135 \\<br />
\text{row 3} &#038; =\max(D3,(B2+B3+2*B4)/4) &#038; =\max(D3,(C2+C3+2*C4)/4) &#038; =\max(D3,(D2+D3+2*D4)/4) &#038; -\$20 \\<br />
\text{row 4} &#038; =\max(D4,(B3+B4+2*B5)/4) &#038; =\max(D4,(C3+C4+2*C5)/4) &#038; =\max(D4,(D3+D4+2*D5)/4) &#038; \$95 \\<br />
\text{row 5} &#038; =\max(D5,(B4+B5)/2) &#038; =\max(D5,(C4+C5)/2) &#038; =\max(D5,(D4+D5)/2) &#038; \$210 \\<br />
\hline<br />
\end{array}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="1057" height="147" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg12.png" alt="\begin{displaymath} \begin{array}{\vert l\vert lllr\vert} \hline &amp; \text{column... ...(C4+C5)/2) &amp; =\max(D5,(D4+D5)/2) &amp; \$210 \ \hline \end{array}\end{displaymath}"/></div>
<p><font size="-2">.</font></p>
<p>This is in fact the same type dynamic programming[<a href="#dynamicProgramming">1</a>] method used to value options under the <em>binomial model</em>.</p>
<p>The completed diagram is shown in Figure&nbsp;<a href="#fig:chain3">6</a>.</p>
<div align="center"><a name="fig:chain3" id="fig:chain3"></a><a name="120"></a></p>
<table>
<caption align="bottom"><strong>Figure 6:</strong> Complete Valuation</caption>
<tr>
<td>
<div align="center"><img width="505" height="400" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Chain3.png" alt="Image Chain3"/></div>
</td>
</tr>
</table>
</div>
<p>For our (made up) example the net-value of round trip trade is an expected value <img width="47" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg13.png" alt="$ \$7.47$"/> profit.</p>
<p>What remains is to choose a set of conditions to base a model estimates on. We then only trade situations that have an acceptable predicted risk and reward profile.</p>
<p>To build the state transition models we collect all the historic trade data and then segregate it into groups of data that match each possible trigger condition we wish to use to help bias our system. There is a trade-off: the more detailed the list of trigger conditions the more powerful biases we can detect (things are less smeared together) but we have less data available for each possible combination of conditions and lower reliability in modeling. To address this we advocate using non-parametric or kernel methods here to average data that nearly fits the conditions to get estimates that are both detailed and reliable.</p>
<p>For example our estimate is of the form:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
P(s_1 \rightarrow s_2) \approx<br />
\frac{<br />
\sum_{training-example} wt(training-example,s_1) P(s_1 \rightarrow s_2|training-example,s_1)<br />
}{<br />
\sum_{training-example} wt(training-example,s_1) P(s_1|training-example)<br />
}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="766" height="68" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg14.png" alt="$\displaystyle P(s_1 \rightarrow s_2) \approx \frac{ \sum_{training-example} wt(... ...sum_{training-example} wt(training-example,s_1) P(s_1\vert training-example) } $"/></div>
<p>A usable <!-- MATH<br />
 $wt(training-example,s_1)$<br />
 --><br />
<img width="227" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg15.png" alt="$ wt(training-example,s_1)$"/> can be gotten from the law of conditional probability (<!-- MATH<br />
 $P(A, B) = P(A)P(B|A)$<br />
 --><br />
<img width="203" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg16.png" alt="$ P(A, B) = P(A)P(B\vert A)$"/> ), so we use <!-- MATH<br />
 $P(training-example,s_1) = P(s_1 | training-example)<br />
P(training-example)$<br />
 --><br />
<img width="649" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg17.png" alt="$ P(training-example,s_1) = P(s_1 \vert training-example) P(training-example)$"/> . Under empirical re-sampling each training example is treated as equally likely (more common situations are accounted by the fact they yield more examples in the training set) so we can use <!-- MATH<br />
 $wt(training-example,s_1) = P(s_1 | training-example)$<br />
 --><br />
<img width="466" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg18.png" alt="$ wt(training-example,s_1) = P(s_1 \vert training-example)$"/> .</p>
<p>For <!-- MATH<br />
 $P(s_1 \rightarrow s_2 | training-example)$<br />
 --><br />
<img width="264" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg19.png" alt="$ P(s_1 \rightarrow s_2 \vert training-example)$"/> we can just estimate the frequency of when we are in a <img width="56" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg20.png" alt="$ state_A$"/> near <img width="21" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg21.png" alt="$ s_1$"/> how often do we see a next-state <img width="57" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg22.png" alt="$ state_B$"/> such that <!-- MATH<br />
 $state_B/state_A$<br />
 --><br />
<img width="118" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg23.png" alt="$ state_B/state_A$"/> is approximately <img width="97" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg24.png" alt="$ s-2/s-1$"/> .</p>
<p>For both of these estimates is pays to blur things a bit during the estimation procedure replacing sums of the form:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
E_{condition(x)=true}[f(x)] = \frac{<br />
\sum_{condition(x)=true} f(x)<br />
}{<br />
\sum_{condition(x)=true} 1<br />
}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="376" height="70" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg25.png" alt="$\displaystyle E_{condition(x)=true}[f(x)] = \frac{ \sum_{condition(x)=true} f(x) }{ \sum_{condition(x)=true} 1 } $"/></div>
<p>with softer forms like:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
E_{condition(x)=true}[f(x)] \approx \frac{<br />
\sum_{x} e^{-\lambda violation(x)}f(x)<br />
}{<br />
\sum_{x} e^{-\lambda violation(x)}<br />
}<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="379" height="69" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg26.png" alt="$\displaystyle E_{condition(x)=true}[f(x)] \approx \frac{ \sum_{x} e^{-\lambda violation(x)}f(x) }{ \sum_{x} e^{-\lambda violation(x)} } . $"/></div>
<h1><a name="SECTION00040000000000000000" id="SECTION00040000000000000000">A Second Model</a></h1>
<p>One thing we one might want is to use a much more detailed model of time. One way to do this is just to add more time-states to the model. This can cause problems as we now have many more transition probabilities to estimate.<a name="tex2html10" href="#foot134" id="tex2html10"><sup>4</sup></a>Suppose we wanted to switch our model from being indexed by time to being indexed by tick. Bid, Ask and Trade ticks can happen at any time and any rate so even with a trading deadline, so there is uncertainty in how many more ticks there are before the trade deadline. We can work at the tick level (without introducing too many states) by introducing a new model that has cycles in the arrow diagram (see Figure&nbsp;<a href="#fig:chain4">7</a>).</p>
<div align="center"><a name="fig:chain4" id="fig:chain4"></a><a name="140"></a></p>
<table>
<caption align="bottom"><strong>Figure 7:</strong> Recurrent Model (With Cycles)</caption>
<tr>
<td>
<div align="center"><img width="505" height="400" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/./Chain4.png" alt="Image Chain4"/></div>
</td>
</tr>
</table>
</div>
<p>The short vertical arrows represent the odds of moving from price-state to price-state in the same time column. The left to right dotted arrows represent the odds of being the tick that moves to the next time column. We can now estimate the transition odds from a great quantity of per-tick data giving us very reliable transition odds. We would like to fill in the values of all the states of this model (like we did in the earlier diagrams)- but the fill-in procedure will not work in the presence of cycles. States we need to fill in our given state do not yet have known values because they themselves depend on the state we are trying to value.</p>
<h2><a name="SECTION00041000000000000000" id="SECTION00041000000000000000">Linear Program Treatment</a></h2>
<p>The standard way to deal with unknown quantities that simultaneously depend on each other is to introduce variables and write down a set of simultaneous inequalities.</p>
<p>If we introduce the variables <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg27.png" alt="$ v$"/> , <!-- MATH<br />
 $a_1 \cdots a_5$<br />
 --><br />
<img width="68" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg28.png" alt="$ a_1 \cdots a_5$"/> , <!-- MATH<br />
 $b_1 \cdots b_5$<br />
 --><br />
<img width="64" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg29.png" alt="$ b_1 \cdots b_5$"/> and <!-- MATH<br />
 $c_1 \cdots c_5$<br />
 --><br />
<img width="64" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg30.png" alt="$ c_1 \cdots c_5$"/> to represent all of the unknown values in our last diagram we can quickly write down many relations we know to be true for them.</p>
<p>For example for the set of variables <img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg31.png" alt="$ c_1$"/> through <img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg32.png" alt="$ c_5$"/> we know that each state is worth at lest as much as the value of stopping in that state. This can be written as:</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
c_1  &#038; \ge &#038;  -\$250  \\<br />
c_2  &#038; \ge &#038;  -\$135  \\<br />
c_3  &#038; \ge &#038;  -\$20 \\<br />
c_4  &#038; \ge &#038;  \$95 \\<br />
c_5  &#038; \ge &#038;  \$210 \\<br />
.<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg33.png" alt="$\displaystyle c_1$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="57" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg35.png" alt="$\displaystyle -\$250$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg36.png" alt="$\displaystyle c_2$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="57" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg37.png" alt="$\displaystyle -\$135$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg38.png" alt="$\displaystyle c_3$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="47" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg39.png" alt="$\displaystyle -\$20$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg40.png" alt="$\displaystyle c_4$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="32" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg41.png" alt="$\displaystyle \$95$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg42.png" alt="$\displaystyle c_5$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="42" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg43.png" alt="$\displaystyle \$210$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="10" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg44.png" alt="$\displaystyle .$"/></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>Each state (except deadline and stop-loss states) is also worth at least the expected value of continuing one more step, which can be written as:</p>
<p></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
c_2  &#038; \ge &#038;  p(c_2  \rightarrow  c_2) c_2 + p(c_2  \rightarrow  c_1) c_1 + p(c_2  \rightarrow  c_3) c_3 + p(c_2 \;\text{escape}) (-\$135) \\<br />
c_3  &#038; \ge &#038;  p(c_3  \rightarrow  c_3) c_3 + p(c_3  \rightarrow  c_2) c_2 + p(c_3  \rightarrow  c_4) c_4 + p(c_3 \;\text{escape}) (-\$20) \\<br />
c_4  &#038; \ge &#038;  p(c_4  \rightarrow  c_4) c_4 + p(c_4  \rightarrow  c_3) c_3 + p(c_4  \rightarrow  c_5) c_5 + p(c_4 \;\text{escape}) \$95 \\<br />
c_5  &#038; \ge &#038;  p(c_5  \rightarrow  c_5) c_5 + p(c_5  \rightarrow  c_4) c_4 + p(c_5 \;\text{escape}) \$210 \\<br />
.<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg36.png" alt="$\displaystyle c_2$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="412" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg45.png" alt="$\displaystyle p(c_2 \rightarrow c_2) c_2 + p(c_2 \rightarrow c_1) c_1 + p(c_2 \rightarrow c_3) c_3 + p(c_2 \;$"/>escape<img width="78" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg46.png" alt="$\displaystyle ) (-\$135)$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg38.png" alt="$\displaystyle c_3$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="412" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg47.png" alt="$\displaystyle p(c_3 \rightarrow c_3) c_3 + p(c_3 \rightarrow c_2) c_2 + p(c_3 \rightarrow c_4) c_4 + p(c_3 \;$"/>escape<img width="69" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg48.png" alt="$\displaystyle ) (-\$20)$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg40.png" alt="$\displaystyle c_4$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="412" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg49.png" alt="$\displaystyle p(c_4 \rightarrow c_4) c_4 + p(c_4 \rightarrow c_3) c_3 + p(c_4 \rightarrow c_5) c_5 + p(c_4 \;$"/>escape<img width="40" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg50.png" alt="$\displaystyle ) \$95$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="20" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg42.png" alt="$\displaystyle c_5$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="289" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg51.png" alt="$\displaystyle p(c_5 \rightarrow c_5) c_5 + p(c_5 \rightarrow c_4) c_4 + p(c_5 \;$"/>escape<img width="49" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg52.png" alt="$\displaystyle ) \$210$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="10" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg44.png" alt="$\displaystyle .$"/></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/></p>
<p>This can be re-written into matrix form where we have</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
A =<br />
\begin{bmatrix}<br />
 1 &#038;   &#038;   &#038;   &#038;   \\<br />
   &#038; 1 &#038;   &#038;   &#038;   \\<br />
   &#038;   &#038; 1 &#038;   &#038;   \\<br />
   &#038;   &#038;   &#038; 1 &#038;   \\<br />
   &#038;   &#038;   &#038;   &#038; 1 \\<br />
-P(c_2 \rightarrow c_1) &#038;  1-P(c_2 \rightarrow c_2) &#038; -P(c_2 \rightarrow c_3) &#038; &#038; \\<br />
   &#038; -P(c_3 \rightarrow c_2) &#038; 1-P(c_3 \rightarrow c_3) &#038; -P(c_3 \rightarrow c_4) &#038; \\<br />
   &#038;   &#038; -P(c_4 \rightarrow c_3) &#038; 1-P(c_4 \rightarrow c_4) &#038; -P(c_4 \rightarrow c_5) \\<br />
   &#038;   &#038;   &#038;  -P(c_5 \rightarrow c_4)  &#038; 1-P(c_5 \rightarrow c_5)<br />
\end{bmatrix},<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="734" height="226" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg53.png" alt="$\displaystyle A = \begin{bmatrix} 1 &amp; &amp; &amp; &amp; \ &amp; 1 &amp; &amp; &amp; \ &amp; &amp; 1 &amp; &amp; \ &amp; &amp;... ...5) \ &amp; &amp; &amp; -P(c_5 \rightarrow c_4) &amp; 1-P(c_5 \rightarrow c_5) \end{bmatrix}, $"/></div>
<p><!-- MATH<br />
 \begin{displaymath}<br />
b =<br />
\begin{bmatrix}<br />
-\$250 \\<br />
-\$135 \\<br />
-\$20 \\<br />
\$95 \\<br />
\$210 \\<br />
P(c_2 \;\text{escape}) (-\$135) \\<br />
P(c_3 \;\text{escape}) (-\$20) \\<br />
P(c_4 \;\text{escape}) \$95 \\<br />
P(c_5 \;\text{escape}) \$210<br />
\end{bmatrix}<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="232" height="226" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg54.png" alt="$\displaystyle b = \begin{bmatrix} -\$250 \ -\$135 \ -\$20 \ \$95 \ \$21... ... \ P(c_4 \;\text{escape}) \$95 \ P(c_5 \;\text{escape}) \$210 \end{bmatrix}$"/></div>
<p>and our vector of unknowns is</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
x =<br />
\begin{bmatrix}<br />
c_1 \\<br />
c_2 \\<br />
c_3 \\<br />
c_4 \\<br />
c_5<br />
\end{bmatrix}.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="90" height="135" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg55.png" alt="$\displaystyle x = \begin{bmatrix} c_1 \ c_2 \ c_3 \ c_4 \ c_5 \end{bmatrix}. $"/></div>
<p>In matrix form we say <img width="62" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg56.png" alt="$ A x \ge b$"/> . We are assuming we have estimates for all of the entries of <img width="18" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg57.png" alt="$ A$"/> and <img width="12" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg58.png" alt="$ b$"/> &#8211; so the only unknowns are the entries of <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg59.png" alt="$ x$"/> . If these were equalities (instead of inequalities) we would call this a set of simultaneous equations and we could use linear algebra to solve for the unknown values. Because they are inequalities we will have to instead solve what is known as a linear program.[<a href="#linProg">5</a>] It turns out the optimal values for <!-- MATH<br />
 $c_1, \cdots c_5$<br />
 --><br />
<img width="69" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg60.png" alt="$ c_1, \cdots c_5$"/> are given by solving:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
\min 1\cdot x \;\text{s.t.}\;\\A x \ge b .<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="78" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg61.png" alt="$\displaystyle \min 1\cdot x \;$"/>s.t.<img width="68" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg62.png" alt="$\displaystyle \;\\ A x \ge b . $"/></div>
<p>This has an admittedly strange form (the objective condition <!-- MATH<br />
 $\min 1\cdot x$<br />
 --><br />
<img width="72" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg63.png" alt="$ \min 1\cdot x$"/> seems very arbitrary and one would at first think the likely form is <!-- MATH<br />
 $\max p \cdot x$<br />
 --><br />
<img width="76" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg64.png" alt="$ \max p \cdot x$"/> where <img width="14" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg65.png" alt="$ p$"/> is the vector probabilities of getting into each <img width="12" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg66.png" alt="$ c$"/> -state). There is also the issue that we merely wrote down inequalities that we knew would be true for the optimal solution to the stopping problem, but we have not guaranteed that there are not more conditions we have not thought of (i.e. these conditions are necessary, but we have not yet established that they are sufficient).</p>
<p>We show (in the appendix) that this is in fact the right procedure for solving for all of the <img width="12" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg66.png" alt="$ c$"/> -values. Each of these linear programs can be quickly solved using standard software. We can also see that the same type of procedure can then be applied to the <img width="12" height="20" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg58.png" alt="$ b$"/> -values (which depend only on <img width="12" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg66.png" alt="$ c$"/> -values, which are by this point known). In fact we can substitute back (using linear programs instead of filling-in) until we know <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg27.png" alt="$ v$"/> the expected value (under the model) of the entire round-trip trade.</p>
<h2><a name="SECTION00042000000000000000" id="SECTION00042000000000000000">More on the Transition Probability Estimate</a></h2>
<p>We can augment our state to carry more information that just the current ask-price relative to our previous night&acirc;&euro;&trade;s sale</p>
<p>If we are in <img width="79" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg67.png" alt="$ stage-b$"/> of our Markov model we can modify <!-- MATH<br />
 $wt(training-example,s_1)$<br />
 --><br />
<img width="227" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg15.png" alt="$ wt(training-example,s_1)$"/> to be: <!-- MATH<br />
 $P(s_1 | training-example)<br />
P(training-example | todayâ¬"s \; stage-a \; move \; summary)$<br />
 --><br />
<img width="683" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg68.png" alt="$ P(s_1 \vert training-example) P(training-example \vert today&acirc;&euro;&trade;s \; stage-a \; move \; summary)$"/> (to do this we build an estimated transition matrix for <img width="81" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg69.png" alt="$ stage-a$"/> from only the trajectory of today&acirc;&euro;&trade;s stock and then evaluate how likely the trajectory the training example from the past is under this model, much smoothing/blurring is required to make this calculation usable). Even better: we can group training data and use Bayes&acirc;&euro;&trade; law: <!-- MATH<br />
 $P(training-group | todayâ¬"s \; stage-a \; move \; summary) = P(todayâ¬"s \; stage-a \; move \; summary | training-group) P(training-group) /<br />
P(todayâ¬"s \; stage-a \; move \; summary)$<br />
 --><br />
<img width="1399" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg70.png" alt="$ P(training-group \vert today&acirc;&euro;&trade;s \; stage-a \; move \; summary) = P(today&acirc;&euro;&trade;s ... ... training-group) P(training-group) / P(today&acirc;&euro;&trade;s \; stage-a \; move \; summary)$"/></p>
<p>This allows us to group the training examples (on a few criteria, like less than a month old or not, trading volume, volatility &#8230;) and use a group of examples to build a model to evaluate today&acirc;&euro;&trade;s moves against (aggregated data to form model to check today&acirc;&euro;&trade;s single trajectory). As is traditional in Bayes estimates we ignore the denominator as it does not vary as a function of training group.</p>
<h1><a name="SECTION00050000000000000000" id="SECTION00050000000000000000">Conclusion</a></h1>
<p>We have demonstrated some of the methods of using standard statistical and optimization techniques to automatically generate and back-test <em>un-roll</em> trades that turn properly conditioned technical trades into profitable round-trip trades. What we have presented is the technical machinery for building the <em>second half</em> of a profitable trade pair where the first half is some technical signal such as price or a market external trigger.</p>
<h2><a name="SECTION00060000000000000000" id="SECTION00060000000000000000">Bibliography</a></h2>
<dl compact>
<dt><a name="dynamicProgramming" id="dynamicProgramming">1</a></dt>
<dd>B<small>ELLMAN,</small> R.<br />
<em>Dynamic Programming</em>.<br />
Dover Publications, 2003.</dd>
<dt><a name="stopping" id="stopping">2</a></dt>
<dd>B<small>REIMAN,</small> L.<br />
<em>Stopping Rule Problems</em>.<br />
John Wiley &amp; sons, 1964, ch.&nbsp;Applied Combinatorial Mathematics.</dd>
<dt><a name="Ioannids:2005aa" id="Ioannids:2005aa">3</a></dt>
<dd>I<small>OANNIDS,</small> J. P.&nbsp;A.<br />
Why most published research findings are false.<br />
<em>PLOS Medicine 2</em>, 8 (Aug 2005), 0697-0701.</dd>
<dt><a name="markovChains" id="markovChains">4</a></dt>
<dd>K<small>EMENY,</small> J.&nbsp;G., <small>AND</small> S<small>NELL,</small> J.&nbsp;L.<br />
<em>Finite Markov Chains</em>.<br />
Springer, 1960.</dd>
<dt><a name="linProg" id="linProg">5</a></dt>
<dd>S<small>CHRIJVER,</small> A.<br />
<em>Theory of Linear and Integer Programming</em>.<br />
John Wiley &amp; sons, 1986.</dd>
<dt><a name="investments" id="investments">6</a></dt>
<dd>S<small>HARPE,</small> W., A<small>LEXANDER,</small> G.&nbsp;J., <small>AND</small> B<small>AILLY,</small> J.&nbsp;W.<br />
<em>Investments</em>, 6&nbsp;ed.<br />
Prentice Hall, 1998.</dd>
<dt><a name="vonMises" id="vonMises">7</a></dt>
<dd><small>VON</small> M<small>ISES,</small> R.<br />
<em>Probability, Statistics and Truth</em>.<br />
Dover Publications, 1981.</dd>
<dt><a name="nonparametricStatistics" id="nonparametricStatistics">8</a></dt>
<dd>W<small>ASSERMAN,</small> L.<br />
<em>All of Nonparametric Statistics</em>.<br />
Springer, 2006.</dd>
</dl>
<div align="center"><b>APPENDIX</b></div>
<h1><a name="SECTION00070000000000000000" id="SECTION00070000000000000000">Why the Linear Program Solution is Correct</a></h1>
<p>How do we know the linear program solves the original problem?</p>
<ul>
<li>Because there are a lot of formulas?</li>
<li>Linear program looks kind-of right?</li>
<li>Works on a few examples?</li>
</ul>
<p>To actually prove correctness we need to derive and compare to some representations of the optimal solution. All of the inequalities we wrote must be true for the optimal solution- but we have no prior guarantee that these are the only conditions. Their could be additional conditions that we forgot to model.</p>
<p>Breiman[<a href="#stopping">2</a>] presented a clever argument technique that exploits the particularly nice structure of solutions of this problem. He noticed that solutions have both a lattice like structure (you can combine solutions by taking minimums) and an operator structure (applying the probability transition matrix and stopping rules to a solution yields a solution). It turns out this is too much well behaved structure for any non-trivial solution set to have and it lets us show that optimal solutions are essentially unique which in turn lets us show the linear program solution solves the actual trading problem.</p>
<div><a name="thm:stopping" id="thm:stopping"><b>Theorem 1</b></a> &nbsp; <i>Assume that every state in the Markov chain has a path to a forced stopping state. Let <img width="18" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg71.png" alt="$ T$"/> be a maximal optimal set of stopping nodes and define the vector <img width="11" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg72.png" alt="$ t$"/> such that <!-- MATH<br />
 $t_i = E[stopping\; value\; under\; T\; rules \;|\; started \; at \; i]$<br />
 --><br />
<img width="418" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg73.png" alt="$ t_i = E[stopping\; value\; under\; T\; rules \;\vert\; started \; at \; i]$"/> . Let <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg59.png" alt="$ x$"/> be an optimal feasible solution to the linear program:</i></p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
\min 1 \cdot x &#038; &#038; \\<br />
x &#038; \ge &#038; stop\\<br />
(I-P)x &#038; \ge &#038; 0<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="72" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg74.png" alt="$\displaystyle \min 1 \cdot x$"/></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="15" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg75.png" alt="$\displaystyle x$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap><img width="38" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg76.png" alt="$\displaystyle stop$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="77" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg77.png" alt="$\displaystyle (I-P)x$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/><br />
<i>where <img width="14" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg78.png" alt="$ I$"/> is the identity matrix, <img width="19" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg79.png" alt="$ P$"/> is the matrix of transition odds of the Markov chain and <img width="38" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg80.png" alt="$ stop$"/> is the vector of stopping values.</i></p>
<p><i>Then <img width="47" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg81.png" alt="$ x = t$"/> .</i></p>
</div>
<p>The theorem says if <img width="11" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg72.png" alt="$ t$"/> is an optimal solution for the original valuation problem (that we may or may not know how to calculate) and <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg59.png" alt="$ x$"/> is an optimal feasible solution to the linear program (which is now written in a slightly different but equivalent form) then <img width="47" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg81.png" alt="$ x = t$"/> . So, as hoped, solving the linear program is equivalent to solving the original stopping problem. The extra condition of every state being able to eventual reach a forced stopping state is true in our formulation due to the trading deadline.</p>
<p>The proof gets a little involved but the essential ideas are as follows:</p>
<ul>
<li>Check an optimal stopping solution would obey the linear program inequalities (so they are necessary, still need to show they are sufficient).</li>
<li>Show that the linear program solution even if it did differ from the optimal stopping solution can not be less than the optimal stopping solution in any coordinate (this is the lattice minimum step).</li>
<li>Use the fact that every state has a path to a forced stopping state to show that the linear programing solution can not hide any excess value above best possible stopping value away from the rest of the system (this is the operator step).</li>
</ul>
<div><i>Proof</i>. [Proof of Theorem&nbsp;<a href="#thm:stopping">1</a>] The theory of linear programming duality says that there is a <em>dual problem</em> to our linear program and this dual is: <!-- MATH<br />
 $\max u \cdot stop$<br />
 --><br />
<img width="101" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg83.png" alt="$ \max u \cdot stop$"/> where</p>
<div align="center"><!-- MATH<br />
 \begin{eqnarray*}<br />
u, v  &#038; \ge &#038;  0 \\<br />
(u v) A &#038; = &#038; c<br />
.<br />
\end{eqnarray*}<br />
 --></p>
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="33" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg84.png" alt="$\displaystyle u, v$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg34.png" alt="$\displaystyle \ge$"/></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="53" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg85.png" alt="$\displaystyle (u v) A$"/></td>
<td width="10" align="center" nowrap><img width="19" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg86.png" alt="$\displaystyle =$"/></td>
<td align="left" nowrap><img width="18" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg87.png" alt="$\displaystyle c .$"/></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"/><br />
The point of the dual is it is known that for all <img width="52" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg88.png" alt="$ x, u, v$"/> feasible we have <!-- MATH<br />
 $u \cdot stop \le c \cdot x$<br />
 --><br />
<img width="121" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg89.png" alt="$ u \cdot stop \le c \cdot x$"/> . And for optimal <img width="52" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg88.png" alt="$ x, u, v$"/> we have <!-- MATH<br />
 $u \cdot stop = c \cdot x$<br />
 --><br />
<img width="120" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg90.png" alt="$ u \cdot stop = c \cdot x$"/> .</p>
<p>Take <img width="52" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg88.png" alt="$ x, u, v$"/> as an optimal solution to the linear program and the dual.</p>
<p>One can check <img width="11" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg72.png" alt="$ t$"/> itself must obey all of the conditions of the linear program so duality theory tells us <!-- MATH<br />
 $u \cdot stop \le c \cdot t$<br />
 --><br />
<img width="117" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg91.png" alt="$ u \cdot stop \le c \cdot t$"/> .</p>
<p>Define a vector <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg92.png" alt="$ z$"/> such that <!-- MATH<br />
 $z_i = \min(x_i, t_i)$<br />
 --><br />
<img width="126" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg93.png" alt="$ z_i = \min(x_i, t_i)$"/> . <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg92.png" alt="$ z$"/> also obeys the primal linear program inequalities, so we know <!-- MATH<br />
 $u \cdot stop \le c \cdot z$<br />
 --><br />
<img width="120" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg94.png" alt="$ u \cdot stop \le c \cdot z$"/> . Now <!-- MATH<br />
 $u \cdot stop = c \cdot t$<br />
 --><br />
<img width="116" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg95.png" alt="$ u \cdot stop = c \cdot t$"/> so we have <!-- MATH<br />
 $c \cdot t \le c \cdot z$<br />
 --><br />
<img width="90" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg96.png" alt="$ c \cdot t \le c \cdot z$"/> . Each entry of <img width="12" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg66.png" alt="$ c$"/> is <img width="14" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg97.png" alt="$ 1$"/> and <!-- MATH<br />
 $z_i \le t_i$<br />
 --><br />
<img width="56" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg98.png" alt="$ z_i \le t_i$"/> for all <img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> which can only mean that <img width="46" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg100.png" alt="$ z = t$"/> . This means entry by entry we have <!-- MATH<br />
 $x_i \ge t_i$<br />
 --><br />
<img width="58" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg101.png" alt="$ x_i \ge t_i$"/> .</p>
<p>Now define the vector function <img width="34" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg102.png" alt="$ F()$"/> such that <!-- MATH<br />
 $F(w)_i = \max(stop_i, \sum_j P(i \rightarrow j) w_j)$<br />
 --><br />
<img width="300" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg103.png" alt="$ F(w)_i = \max(stop_i, \sum_j P(i \rightarrow j) w_j)$"/> . For the true solution <img width="11" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg72.png" alt="$ t$"/> we have <img width="72" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg104.png" alt="$ F(t) = t$"/> . The linear program solution <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg59.png" alt="$ x$"/> also has <img width="80" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg105.png" alt="$ F(x) = x$"/> . Now if we suppose <img width="47" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg106.png" alt="$ x \neq t$"/> then there exists an <img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> such that <img width="56" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg107.png" alt="$ x_i - t_i$"/> is maximal and state <img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> points to at least one state <img width="13" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg108.png" alt="$ j$"/> such that <!-- MATH<br />
 $x_i - t_i > x_j &#8211; t_j$<br />
 &#8211;><br />
<img width="136" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg109.png" alt="$ x_i - t_i &gt; x_j - t_j$"/> . This must be true because none of these maximal difference states can be forced stopping states. So some maximal difference state must have a transition to a non maximal difference, otherwise this would violate the fact that all states have eventual paths to forced stopping states (where <!-- MATH<br />
 $x_k - t_k = 0$<br />
 --><br />
<img width="96" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg110.png" alt="$ x_k - t_k = 0$"/> ). For this particular <img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> we claimed <!-- MATH<br />
 $x_i > t_i \ge stop_i$<br />
 &#8211;><br />
<img width="123" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg111.png" alt="$ x_i &gt; t_i \ge stop_i$"/> so we have:</p>
<p><!-- MATH<br />
 \begin{displaymath}<br />
(F(x) - F(t))_i =<br />
\left\{<br />
\begin{array}{l l}<br />
   \sum_j P(i \rightarrow j) x_j - stop_i &#038; \quad \text{if $\sum_j P(i \rightarrow j) t_j < stop_i$} \\<br />
   \sum_j P(i \rightarrow j) (x_j - t_j) &#038; \quad \text{otherwise}<br />
\\\end{array} \right.<br />
.<br />
\end{displaymath}<br />
 --></p>
<div align="center"><img width="592" height="55" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg112.png" alt="\begin{displaymath} (F(x) - F(t))_i = \left\{ \begin{array}{l l} \sum_j P(i \ri... ... (x_j - t_j) &amp; \quad \text{otherwise} \\ \end{array} \right. . \end{displaymath}"/></div>
<p>So either way we have for this particular <img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> : <!-- MATH<br />
 $(F(x) - F(t))_i \le \sum_j P(i \rightarrow j) (x_j - t_j)$<br />
 --><br />
<img width="323" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg113.png" alt="$ (F(x) - F(t))_i \le \sum_j P(i \rightarrow j) (x_j - t_j)$"/> . But we must have <!-- MATH<br />
 $\sum_j P(i \rightarrow j) (x_j - t_j) < x_i - t_i$<br />
 --><br />
<img width="255" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg114.png" alt="$ \sum_j P(i \rightarrow j) (x_j - t_j) &lt; x_i - t_i$"/> because <!-- MATH<br />
 $\sum_j P(i \rightarrow j) = 1$<br />
 --><br />
<img width="143" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg115.png" alt="$ \sum_j P(i \rightarrow j) = 1$"/> , <!-- MATH<br />
 $x_j - t_j \le x_i - t_i$<br />
 --><br />
<img width="136" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg116.png" alt="$ x_j - t_j \le x_i - t_i$"/> for all <img width="13" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg108.png" alt="$ j$"/> and <!-- MATH<br />
 $x_j - t_j < x_i - t_i$<br />
 --><br />
<img width="136" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg117.png" alt="$ x_j - t_j &lt; x_i - t_i$"/> for at least one <img width="13" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg108.png" alt="$ j$"/> . So <!-- MATH<br />
 $(F(x) - F(t))_i < x_i - t_i$<br />
 --><br />
<img width="200" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg118.png" alt="$ (F(x) - F(t))_i &lt; x_i - t_i$"/> and we see <img width="34" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg102.png" alt="$ F()$"/> is essentially a contraction on the segment between <img width="11" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg72.png" alt="$ t$"/> and <img width="15" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg59.png" alt="$ x$"/> . Since a contraction on a bounded interval can not have two distinct fixed points our supposition that <img width="47" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg106.png" alt="$ x \neq t$"/> is untenable and we know <img width="47" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg81.png" alt="$ x = t$"/> . <img width="19" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg82.png" alt="$ \qedsymbol$"/></div>
<p>We are done- we have shown there is essentially only one optimal solution to the stopping problem (the only possible variation is rules that differ in what they do for states-<img width="11" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg99.png" alt="$ i$"/> such that <!-- MATH<br />
 $\sum_j P(i \rightarrow j) t_j = stop_i$<br />
 --><br />
<img width="187" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg119.png" alt="$ \sum_j P(i \rightarrow j) t_j = stop_i$"/> ). We also should by now have some insight as to why we used a linear program like <!-- MATH<br />
 $\min 1\cdot x \;\text{s.t.}\; A x \ge b$<br />
 --><br />
<img width="78" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg120.png" alt="$ \min 1\cdot x \;$"/>s.t.<img width="68" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/stoimg121.png" alt="$ \; A x \ge b$"/> : the linear program is solving for the minimum value at each state that does not dip below the expected value of neighboring states.</p>
<p></p>
<hr />
<h4>Footnotes</h4>
<dl>
<dt><a name="foot10" id="foot10">&#8230; Mount</a><a href="#tex2html1"><sup>1</sup></a></dt>
<dd>http://www.mzlabs.com/</dd>
<dt><a name="foot21" id="foot21">&#8230; feeds.</a><a href="#tex2html2"><sup>2</sup></a></dt>
<dd>To emphasize; by technical trades we mean trades based on market data (as opposed to fundamental analysis) we do not include popular culture uses of the term such as candlesticks, Eliot waves and so on.</dd>
<dt><a name="foot22" id="foot22">&#8230; announcements.</a><a href="#tex2html3"><sup>3</sup></a></dt>
<dd>We are assuming that these triggers can be made automatic by using a labeled information service or natural language processing techniques.</dd>
<dt><a name="foot134" id="foot134">&#8230; estimate.</a><a href="#tex2html10"><sup>4</sup></a></dt>
<dd>The explosion of states can be managed by adding some regularity conditions on how transition probability estimates are allowed to change over time. This serves to reduce the complexity or rank of the estimation problem and improves the generalization ability of the model.</dd>
</dl>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/03/what-does-the-market-think/' rel='bookmark' title='Permanent Link: What does the Market Think?'>What does the Market Think?</a></li>
<li><a href='http://www.win-vector.com/blog/2007/06/new-paper/' rel='bookmark' title='Permanent Link: New Paper'>New Paper</a></li>
<li><a href='http://www.win-vector.com/blog/2009/09/a-discrete-model-gauging-market-efficiency/' rel='bookmark' title='Permanent Link: A Discrete Model Gauging Market Efficiency'>A Discrete Model Gauging Market Efficiency</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
