<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Win-Vector Blog &#187; Mathematical Bedside Reading</title>
	<atom:link href="http://www.win-vector.com/blog/tag/mathematical-bedside-reading/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.win-vector.com/blog</link>
	<description>The Applied Theorist&#039;s Point of View</description>
	<lastBuildDate>Thu, 29 Jul 2010 17:09:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Gradients via Reverse Accumulation</title>
		<link>http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=gradients-via-reverse-accumulation</link>
		<comments>http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/#comments</comments>
		<pubDate>Thu, 15 Jul 2010 00:00:04 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[Exciting Techniques]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[Automatic Differentiation]]></category>
		<category><![CDATA[Conjugate Gradient]]></category>
		<category><![CDATA[Gradient]]></category>
		<category><![CDATA[Mathematical Bedside Reading]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Reverse Accumulation]]></category>
		<category><![CDATA[Scala]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1493</guid>
		<description><![CDATA[We extend the ideas of from Automatic Differentiation with Scala to include the reverse accumulation. Reverse accumulation is a non-obvious improvement to automatic differentiation that can in many cases vastly speed up calculations of gradients. As the tables, diagrams and equations do not translate well into HTML, our full article is available here in PDF: [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/' rel='bookmark' title='Permanent Link: Automatic Differentiation with Scala'>Automatic Differentiation with Scala</a></li>
<li><a href='http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/' rel='bookmark' title='Permanent Link: &#8220;Easy&#8221; Portfolio Allocation'>&#8220;Easy&#8221; Portfolio Allocation</a></li>
<li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>We extend the ideas of from <a target="ext" href="http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/">Automatic Differentiation with Scala</a> to include the <em>reverse accumulation</em>.  Reverse accumulation is a non-obvious improvement to automatic differentiation that can in many cases vastly speed up calculations of gradients.<span id="more-1493"></span><br />
As the tables, diagrams and equations do not translate well into HTML, our full article is available here in PDF: <a href="http://www.win-vector.com/dfiles/ReverseAccumulation.pdf">http://www.win-vector.com/dfiles/ReverseAccumulation.pdf</a>.</p>
<p>The purpose of our article is to explain reverse accumulation automatic differentiation clearly (and to release some sample code and timing results).  A side effect of the article is to make sense of the following two diagrams:</p>
<p>If the following is picture of standard or forward differentiation:</p>
<p><img src="http://www.win-vector.com/blog/wp-content/uploads/2010/07/cutFwd.png" alt="cutFwd.png" border="0" width="408" height="677" /></p>
<p>then the following is a picture of reverse accumulation:</p>
<p><img src="http://www.win-vector.com/blog/wp-content/uploads/2010/07/cutRev.png" alt="cutRev.png" border="0" width="487" height="739" /></p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/' rel='bookmark' title='Permanent Link: Automatic Differentiation with Scala'>Automatic Differentiation with Scala</a></li>
<li><a href='http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/' rel='bookmark' title='Permanent Link: &#8220;Easy&#8221; Portfolio Allocation'>&#8220;Easy&#8221; Portfolio Allocation</a></li>
<li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;Easy&#8221; Portfolio Allocation</title>
		<link>http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=easy-portfolio-allocation</link>
		<comments>http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/#comments</comments>
		<pubDate>Thu, 14 Jan 2010 20:09:13 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Finance]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[Lagrange Multipliers]]></category>
		<category><![CDATA[Mathematical Bedside Reading]]></category>
		<category><![CDATA[Portfolio Theory]]></category>
		<category><![CDATA[Sharpe Ratio]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1342</guid>
		<description><![CDATA[This is an elementary mathematical finance article. This means if you know some math (linear algebra, differential calculus) you can find a quick solution to a simple finance question. The topic was inspired by a recent article in The American Mathematical Monthly (Volume 117, Number 1 January 2010, pp. 3-26): &#8220;Find Good Bets in the [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
<li><a href='http://www.win-vector.com/blog/2009/09/a-discrete-model-gauging-market-efficiency/' rel='bookmark' title='Permanent Link: A Discrete Model Gauging Market Efficiency'>A Discrete Model Gauging Market Efficiency</a></li>
<li><a href='http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/' rel='bookmark' title='Permanent Link: What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?'>What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>This is an elementary mathematical finance article. This means if you know some math (linear algebra, differential calculus) you can find a quick solution to a simple finance question. The topic was inspired by a recent article in The American Mathematical Monthly (Volume 117, Number 1 January 2010, pp. 3-26): &#8220;Find Good Bets in the Lottery, and Why You Shouldn&#8217;t Take Them&#8221; by Aaron Abrams and Skip Garibaldi which said optimal asset allocation is now an undergraduate exercise. That may well be, but there are a lot of people with very deep mathematical backgrounds that have yet to have seen this. We will fill in the details here. The style is terse, but the content should be about what you would expect from one day of lecture in a mathematical finance course.</p>
<p><span id="more-1342"></span></p>
<p>Portfolio allocation is not the &#8220;magic predict the future&#8221; part of finance, it is the scheme for correctly applying magic predictions of the future. The idea is that if you had an prediction of future returns of a number of assets, the naive thing to do would be to invest everything into the asset with highest predicted return. Portfolio theory, while still taking the predictions at face value, picks an investment pattern that will (in risk-adjusted dollars) outperform the naive strategy even if the predictions are correct and is a bit safer when the predictions are wrong.</p>
<p>Suppose you had <img width="14" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg1.png" alt="$ n$"> different assets you could invest in. For the <img width="10" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg2.png" alt="$ i$"> -th asset there is an expected excess relative return of <img width="19" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg3.png" alt="$ \mu_i$"> and an estimated variance of <img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg4.png" alt="$ s_i$"> (for a definition of relative return see <a href="http://www.win-vector.com/blog/2010/01/relative-returns-a-banker-versus-trader-paradox/">Relative returns: a banker versus trader paradox</a> and for a definition of variance see <a href="http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/">A Quick Appreciation of the Sharpe Ratio</a>). Let the vector <img width="16" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg5.png" alt="$ w$"> be such that <img width="23" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg6.png" alt="$ X_i$"> represents the number of dollars we invest in the <img width="10" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg2.png" alt="$ i$"> -th asset. If <img width="23" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg6.png" alt="$ X_i$"> is positive then our plan is &#8220;to go long&#8221; or buy some of the <img width="10" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg2.png" alt="$ i$"> -th asset. If <img width="23" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg6.png" alt="$ X_i$"> is negative our plan is &#8220;to short&#8221; or sell some of the <img width="10" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg2.png" alt="$ i$"> -th asset to somebody else (It is called going short as we actually sell something we do not have. This is often allowed in finance; as long as we make the same pay-outs to the buyer that the buyer would receive if we really had the item to sell).</p>
<p>When we appeal to the idea of optimizing the portfolio Sharpe Ratio (again, see <a href="http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/">A Quick Appreciation of the Sharpe Ratio</a>) then we say a good portfolio is one that doesn&#8217;t just maximize expected relative returns (which is <img width="39" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg11.png" alt="$ X^{\top} \mu$"> ) but maximizes the ratio of expected relative return to standard deviation:</p>
</p>
<div align="center"><img width="73" height="56" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg8.png" alt="$\displaystyle \frac{X^{\top} \mu}{\sqrt{X^{\top} C X}} $"></div>
<p>where (for now) <img width="17" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg9.png" alt="$ C$"> is the matrix <img width="30" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg10.png" alt="$ s s^{\top}$"> . This ratio is called a &#8220;risk adjusted return&#8221; (versus the un-adjusted form <img width="39" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg11.png" alt="$ X^{\top} \mu$"> ). Also notice that the ratio is homogeneous in <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> (doubling <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> does not change the ratio as it simultaneously doubles the numerator and the denominator) so an optimal solution <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> describes not how much to invest, but what pattern to invest in. This allows us to introduce an important practical constraint: we are only going to allow ourselves to risk a total of <img width="16" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg13.png" alt="$ T$"> dollars (both long and short). That is: we insist <img width="105" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg14.png" alt="$ \sum_{i=1}^{n} \vert X_i\vert = T$"> . We will ignore this total investment constraint until the end when we can satisfy the constraint by simply re-scaling an partial solution.</p>
<p>To solve for <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> we introduce an old friend: <a href="http://en.wikipedia.org/wiki/Lagrange_multipliers">Lagrange Multipliers</a> (or equivalently the Karush-Kuhn-Tucker conditions of optimality). Since the fraction we are trying to optimize is homogeneous in <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> we can convert the denominator into a constraint and arbitrarily insist that <img width="99" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg15.png" alt="$ \sqrt{X^{\top} C X} = 1$"> without changing the nature of the problem. We are now trying to maximize <img width="39" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg11.png" alt="$ X^{\top} \mu$"> subject to <img width="99" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg15.png" alt="$ \sqrt{X^{\top} C X} = 1$"> . The Lagrangian conditions of optimality state at the optimum we must have the gradient of the objective is proportional to the gradient of the constraint or:</p>
</p>
<div align="center"><img width="225" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg16.png" alt="$\displaystyle \nabla_X X^{\top} \mu = \lambda \nabla_X ( \sqrt{X^{\top} C X} - 1 ) $"></div>
<p>for some (to be determined) constant <img width="13" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg17.png" alt="$ \lambda$"> . Pushing the gradient operator through we get:</p>
<div align="center"><img width="213" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg18.png" alt="$\displaystyle \mu = \lambda (1/2) ( X^{\top} C X )^{-1/2} 2 C X . $"></div>
<p>A similar equation could be gotten by appealing to a Rayleigh Quotient argument.</p>
<p>We do not yet know <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> (that is what we are trying to solve for), so we do not know what <img width="56" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg19.png" alt="$ X^{\top} C X$"> is. However, this is just a scalar and since we are just trying to solve up to a multiple we can throw it out and introduce a new multiple and see that it is enough to solve:</p>
</p>
<div align="center"><img width="76" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg20.png" alt="$\displaystyle \mu = \lambda' C X $"></div>
<p>where <img width="18" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg21.png" alt="$ \lambda'$"> is new (still unknown) scalar. This means we have:</p>
<div align="center"><img width="121" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg22.png" alt="$\displaystyle X = (1/\lambda') C^{-1} \mu $"></div>
<p>so our desired solution is some re-scaling of <img width="43" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg23.png" alt="$ C^{-1} \mu$"> .</p>
<p>As we stated earlier we have a total investment constraint of <img width="105" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg14.png" alt="$ \sum_{i=1}^{n} \vert X_i\vert = T$"> . We can achieve this with the following adjusted solution:</p>
</p>
<div align="center"><img width="189" height="51" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg24.png" alt="$\displaystyle X = \frac{T}{\sum_{i=1}^{n} \vert(C^{-1} \mu)_i\vert} C^{-1} \mu $"></div>
<p>as our desired optimal portfolio allocation. In the end we can solve for the optimal portfolio by merely solving a linear system (we don&#8217;t need anything as expensive as a general purpose optimizer in this case).</p>
<p>These are very old results (going back as long as there has been Sharpe Ratios and portfolio theory). A good example reference is: &#8220;The Valuation of Risk Assets and the Selection of Risky Investments in Stock Portfolios and Capital Budgets,&#8221; John Lintner, The Review of Economics and Statistics (1965) vol. 47 (1) pp. 13-37. These results are the basis for advice like: &#8220;diversify.&#8221; Without modeling risk you would tend to put all of your money in the predicted highest paying asset. When modeling risk you tend to put some of your money in each high paying asset and as long as they do not all fail at the same time you have some safety. Another (very different) route to diversification is the Kelly Criterion (discussed in <a href="http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/">What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</a>).</p>
<p>A very important risk we have not yet modeled is that our assets may have a tendency to fail at the same time (meaning we may not have really diversified usefully). The notion of assets may fail at the same time brings us to the ideas of correlation and covariance. When we took <img width="64" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg25.png" alt="$ C = s s^{\top}$"> we were implicitly assuming (or modeling), without justification, that each possible asset was independent of all the others (that there was no correlation between asset returns). This is, of course, not going to be anywhere near true in practice. Instead we should take <img width="17" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg9.png" alt="$ C$"> to be the <a href="http://en.wikipedia.org/wiki/Covariance_matrix">Covariance Matrix</a> that represent our estimate of the assent to asset correlations. In this case the solution methods above all work exactly as before. Companies such as MSCI Barra have made complete businesses out of producing and selling estimates of <img width="17" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg9.png" alt="$ C$"> .</p>
<p>Another issue is when we do not allow ourselves to &#8220;short&#8221; (or take a negative allocation of) assets. In this case we have the additional constraints <img width="48" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg26.png" alt="$ X \ge 0$"> which complicates our solution. For the special case where the asset variances are assumed to be independent (i.e. <img width="64" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg25.png" alt="$ C = s s^{\top}$"> ) it is enough to solve as above and merely replace any negative allocations with zero when inspecting and scaling the final step of the solution. When the covariances are non-trivial (<img width="17" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg9.png" alt="$ C$"> has non-zero off-diagonal entries) this solution may not be optimal. In this case the Karush-Kuhn-Tucker conditions are more complicated and at the point of optimal solution we have the following conditions:</p>
<p></p>
<div align="center">
<table cellpadding="0" align="center">
<tr valign="middle">
<td nowrap align="right"><img width="145" height="60" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg27.png" alt="$\displaystyle \mu + \lambda C X - \sum_{i=1}^{n} \tau_i E^i$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg28.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="19" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg29.png" alt="$\displaystyle X$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg30.png" alt="$\displaystyle \ge$"></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="48" height="60" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg31.png" alt="$\displaystyle \sum_{i=1}^{n} X_i$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg28.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap><img width="16" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg32.png" alt="$\displaystyle T$"></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="13" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg33.png" alt="$\displaystyle \tau$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg30.png" alt="$\displaystyle \ge$"></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="38" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg34.png" alt="$\displaystyle \tau^{\top} X$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg28.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"><br />
where <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> is the allocation vector we wish to solve for, <img width="13" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg17.png" alt="$ \lambda$"> is an unknown scalar, <img width="13" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg35.png" alt="$ \tau$"> is a new unknown vector and <img width="22" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg36.png" alt="$ E^i$"> is the vector with <img width="69" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg37.png" alt="$ (E^i)_i = 1$"> and zeroes elsewhere. Using the Karush-Kuhn-Tucker conditions has allowed us to again almost linearize the problem, but we know have sign constraints on <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> and <img width="13" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg35.png" alt="$ \tau$"> and what is called a complementarity constraint: <img width="67" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg38.png" alt="$ \tau^{\top} X = 0$"> . This sort of problem essentially called a &#8220;Linear Complementarity Problem&#8221; and is about as hard as solving a linear program (the typical solution method is a variation of the simplex method called &#8220;Lemke&#8217;s algorithm&#8221;). (Technically the <img width="13" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg17.png" alt="$ \lambda$"> prevents the problem from being in the right form, but <img width="13" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg17.png" alt="$ \lambda$"> can be inspected out of the problem.) The problem can still be solved, you just need a bit more software. If we can not short assets (or at least simulate shorting assets) we not only eliminate many possible portfolios from consideration (so we likely end up with a less profitable portfolio than we would like) we also make the mathematics and computation a bit harder.</p>
<p>The goal of this writeup has been to show how to systematically convert investment advice like &#8220;this stock is going to really take off&#8221; into an allocation of assets (which in turn implies a pattern of trades). We take as unexamined premises where to get such advice and whether to use the Sharpe ratio or some other notion of risk and/or utility. The point is that even though it may be complicated, from this point it is just calculation and calculation is easy to automate.</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
<li><a href='http://www.win-vector.com/blog/2009/09/a-discrete-model-gauging-market-efficiency/' rel='bookmark' title='Permanent Link: A Discrete Model Gauging Market Efficiency'>A Discrete Model Gauging Market Efficiency</a></li>
<li><a href='http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/' rel='bookmark' title='Permanent Link: What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?'>What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</title>
		<link>http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=what-is-the-gamblers-equivalent-of-amdahls-law</link>
		<comments>http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 20:38:21 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Expository Writing]]></category>
		<category><![CDATA[Quantitative Finance]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Amdahl's Law]]></category>
		<category><![CDATA[Kelly Criterion]]></category>
		<category><![CDATA[Kraft Inequality]]></category>
		<category><![CDATA[Mathematical Bedside Reading]]></category>
		<category><![CDATA[Statistical Detective]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=878</guid>
		<description><![CDATA[While executing some statistical detective work for a client we had a major &#8220;aha!&#8221; moment and realized something like &#8220;Amdahl&#8217;s Law&#8221; rephrased in terms of probability would solve everything. We finished our work using direct methods and moved on. But it is an interesting question: what is the probabilist&#8217;s (or gambler&#8217;s) equivalent of Amdahl&#8217;s Law? [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
<li><a href='http://www.win-vector.com/blog/2008/05/betting-best-of-series/' rel='bookmark' title='Permanent Link: Betting Best-Of Series'>Betting Best-Of Series</a></li>
<li><a href='http://www.win-vector.com/blog/2008/06/how-market-designs-set-prices/' rel='bookmark' title='Permanent Link: How Market Designs Set Prices'>How Market Designs Set Prices</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>While executing some statistical detective work for a client we had a major &#8220;aha!&#8221; moment and realized  something like &#8220;Amdahl&#8217;s Law&#8221; rephrased in terms of probability would solve everything.  We finished our work using direct methods and moved on.  But it is an interesting question: what is the probabilist&#8217;s (or gambler&#8217;s) equivalent of Amdahl&#8217;s Law?<span id="more-878"></span></p>
<p>Amdahl&#8217;s Law is famous idea due to computer architect Gene Amdahl.  It is a simple technique that computer scientists use to re-direct their work back to important parts of problems.  Suppose you have a complicated system you wish to speed up.  Suppose this system is spending a p-fraction of its time in an important sub-process and that you have an idea that would speed up the sub-process by a factor of k.  Should you invest the effort?  </p>
<p>Amdahl&#8217;s Law says (by simple arithmetic): the speed-up (the ratio of the old run-time over the new run-time) the entire system would achieve if you implemented your improvement is not the factor of k you would hope for, but instead:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/10/eq11.png" alt="eq1.png" border="0" width="141" height="56" /><br />
</center></p>
<p>For example if p = 1/3 then you can only speed up the over all system by at most a factor of 33%, even your idea is so astoundingly good that you have k=1000.</p>
<p>Amdahl&#8217;s Law reminds us that speeding up a component you do not lose much time to is not an important accomplishment.  In fact Amdahl&#8217;s Law directly prescribes looking at your most expensive components as being the largest opportunities for improvement.  Appealing to Amdahl&#8217;s Law is an important nerd-tool to end &#8220;color of the bike shed&#8221; arguments (and concentrate only on the design of systems that actually have an impact on outcomes).</p>
<p>It is clear there are similar principles for managing expenses, revenue, effort and so on (such as the Pareto Principle).</p>
<p>But what is the equivalent statement in the harder and more complicated world of probabilities and gambling systems?  There are a lot of candidate statements and theorems (such as &#8220;look for horses not for zebras&#8221;, the Kraft Inequality, Kullback Leibler Distance, Cross Entropy and the Asymptotic Equipartition Principle) but I think the most powerful and direct analogue is: the Kelly Betting System.  The Kelly Betting System is a remarkable system that, like Amdahl&#8217;s Law, tells us exactly what to look at (and surprisingly some things to ignore).</p>
<p>Kelly&#8217;s original paper: &#8220;A New Interpretation of Information Rate&#8221; J. L. Jr Kelly, AT&#038;T Technical Journal (1956) phrases the problem as betting at a horse race.  The technique applies more generally (other forms of gambling, portfolio management, even explaining the preferences of lab-mice) but the clearest example remains a horse race.</p>
<p>We follow the excellent discussion of the problem from Cover and Thomas &#8220;Information Theory&#8221; Wiley (1991).    Consider a simplified horse race where there is only one payoff offered: picking the winning horse.  Suppose the (unknown) true probability of the i-th horse winning is p_i.  Further suppose the track publishes a set of payoffs for each horse such that if you bet a dollar on the i-th horse and it wins: you are given o_i dollars back.   </p>
<p>Now a gambler that has no estimate of the p_i might put all of their money on &#8220;the highest paying horse.&#8221;   That is picking the i such that o_i is maximal (&#8220;going for big score&#8221;).   A somewhat more informed gambler might put all of their money on the &#8220;horse with the best expected return&#8221; that is a horse i that maximizes p_i * o_i.  But this betting strategy &#8220;invites ruin&#8221;:  you have probability of 1 &#8211; p_i of losing all of your money.  Kelly starts with the controversial idea of trying to maximize expected log-return (instead of maximizing expected return).  Maximizing expected log-return avoids ruin, maximizes the exponential rate your wealth grows  and maximizes the median wealth over all outcomes (see: &#8220;The Kelly System Maximizes Median Fortune&#8221; S N Ethier, Journal of Applied Probability (2004) vol. 41 (4) pp. 1230-1236).  Even the observation that you don&#8217;t always want to put all of your money in a &#8220;favorable bet&#8221; (that is one with expectation p_i * o_i >1) is an important one.</p>
<p>To get the next part of Kelly&#8217;s system consider the sum of reciprocals of track offered payoffs:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/10/sum1.png" alt="sum1.png" border="0" width="82" height="68" /><br />
</center></p>
<p>At any real track this sum will be greater than 1 (i.e. the o_i will be small, making the sum large).   The larger the sum the more clearly unfair the track&#8217;s published payoff schedule is.  Let us assume we were at a fantastically generous track where this sum is exactly 1 (admittedly unrealistic, and both the paper and the book work beyond this limitation).  In this case we can write r_i = 1/o_i and we know r_i > 0 and the r_i sum to 1.  That is we can interpret the r_i as the track&#8217;s estimate of the probability of the i-th horse winning.   If o_i = 100 (the track is paying off 100:1) we then can infer they think the i-th horse has no more than a 1 in 100 chance of winning (else they could not afford to offer the bet).  Kelly&#8217;s system gives (and proves correct) the following remarkable advice: if the sum given above is 1 (i.e. the track is paying off at least a fair rate) then you can safely bet all of your money and you should bet a p_i fraction of your money on the i-th horse.  </p>
<p>That is: if you decide the track is paying off so much that it is worth your while to gamble then you should then completely ignore the track&#8217;s payoff schedule in making your bet.   You might use the track&#8217;s published payoffs as some of your evidence when trying to estimate the p_i (the probability of each horse winning), but once you have estimated these probabilities you then ignore the track&#8217;s payoff rates in designing your bets.  In fact your expected rate of winning is exactly proportional to how much closer to the true probabilities your estimate is than the track&#8217;s estimate is (Cover/Thomas example 6.1.1, so if unless you know something the track does not know you should not bet).  Also you should bet even on unlikely and underpaying horses to help cover the possibilities (this is because you are making a series of bets, not just a single bet- so each bet&#8217;s value is computed under the assumption that your other bets have failed).  This (provably correct) advice is contrary to many obvious and traditional betting systems.</p>
<p>The Kelly System is simultaneously very precise and broadly applicable.  For example: it has be extended to many other games and the stock market (see: &#8220;The Kelly Criterion and the Stock Market&#8221; Louis M Rotando, Edward O Thorp, The American Mathematical Monthly (1992) vol. 99 (10) pp. 922-931).  The Kelly System gives actionable advice (exact amounts to bet or exact amounts of effort to invest) and is very specific in saying what to look at.  </p>
<p>Just as Amdahl&#8217;s law shows us component speedup is a distraction the Kelly System shows us that published rates of return are siren songs.  Thus the Kelly System is the gambler&#8217;s equivalent of Amdahl&#8217;s Law.</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
<li><a href='http://www.win-vector.com/blog/2008/05/betting-best-of-series/' rel='bookmark' title='Permanent Link: Betting Best-Of Series'>Betting Best-Of Series</a></li>
<li><a href='http://www.win-vector.com/blog/2008/06/how-market-designs-set-prices/' rel='bookmark' title='Permanent Link: How Market Designs Set Prices'>How Market Designs Set Prices</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Good Graphs: Graphical Perception and Data Visualization</title>
		<link>http://www.win-vector.com/blog/2009/08/good-graphs-graphical-perception-and-data-visualization/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=good-graphs-graphical-perception-and-data-visualization</link>
		<comments>http://www.win-vector.com/blog/2009/08/good-graphs-graphical-perception-and-data-visualization/#comments</comments>
		<pubDate>Fri, 28 Aug 2009 15:40:41 +0000</pubDate>
		<dc:creator>Nina Zumel</dc:creator>
				<category><![CDATA[Exciting Techniques]]></category>
		<category><![CDATA[Expository Writing]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Cleveland]]></category>
		<category><![CDATA[data exploration]]></category>
		<category><![CDATA[graphical perception]]></category>
		<category><![CDATA[Lattice]]></category>
		<category><![CDATA[Mathematical Bedside Reading]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=296</guid>
		<description><![CDATA[What makes a good graph? When faced with a slew of numeric data, graphical visualization can be a more efficient way of getting a feel for the data than going through the rows of a spreadsheet. But do we know if we are getting an accurate or useful picture? How do we pick an effective [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/' rel='bookmark' title='Permanent Link: A Demonstration of Data Mining'>A Demonstration of Data Mining</a></li>
<li><a href='http://www.win-vector.com/blog/2010/02/living-in-a-lognormal-world/' rel='bookmark' title='Permanent Link: Living in A Lognormal World'>Living in A Lognormal World</a></li>
<li><a href='http://www.win-vector.com/blog/2009/04/the-data-enrichment-method/' rel='bookmark' title='Permanent Link: The Data Enrichment Method'>The Data Enrichment Method</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>What makes a good graph? When faced with a slew of numeric data, graphical visualization can be a more efficient way of getting a feel for the data than going through the rows of a spreadsheet. But do we know if we are getting an accurate or useful picture? How do we pick an effective visualization that neither obscures important details, or drowns us in confusing clutter? In 1968, William Cleveland published a text called <a href="http://www.stat.purdue.edu/~wsc/elements.html"><em>The Elements of Graphing Data,</em></a> inspired by Strunk and White&#8217;s classic writing handbook <a href="http://www.amazon.com/Elements-Style-50th-Anniversary/dp/0205632645"><em>The Elements of Style</em></a> . <em>The Elements of Graphing Data</em> puts forward Cleveland&#8217;s philosophy about how to produce good, clear graphs — not only for presenting one&#8217;s experimental results to peers, but also for the purposes of data analysis and exploration. Cleveland&#8217;s approach is based on a theory of graphical perception: how well the human perceptual system accomplishes certain tasks involved in reading a graph. For a given data analysis task, the goal is to align the information being presented with the perceptual tasks the viewer accomplishes the best. <span id="more-296"></span></p>
<blockquote><p>When a graph is made, quantitative and categorical information is encoded by a display method. Then the information is visually decoded. This visual perception is a vital link. No matter how clever the choice of the information, and no matter how technologically impressive the encoding, a visualization fails if the decoding fails. Some display methods lead to efficient, accurate decoding, and others lead to inefficient, inaccurate decoding. It is only through scientific study of visual perception that informed judgments can be made about display methods. The display methods of <em>Elements</em> rest on a foundation of scientific enquiry.</p></blockquote>
<p>— from the preface of <em>The Elements of Graphing Data</em></p>
<p>A revised edition of <em>The Elements of Graphing Data</em> was published in 1994, along with a companion volume, <a href="http://www.stat.purdue.edu/~wsc/visualizing.html"><em>Visualizing Data,</em></a> which is oriented towards the implementation and technical details of different graphing techniques. I highly recommend <em>The Elements of Graphing Data</em> as a guidebook for creating graphs, as well as for its excellent survey of several useful techniques. Cleveland, along with other colleagues at Bell Labs, developed the <a href="http://stat.bell-labs.com/project/trellis/s.html">Trellis display system,</a> a framework for the visualization of multivariable databases, using the ideas developed in his texts. Trellis, in turn, influenced Deepayan Sarkar&#8217;s Lattice graphics system for R. Lattice implements many of Cleveland&#8217;s ideas, and I also recommend Sarkar&#8217;s <a href="http://lmdvr.r-forge.r-project.org/figures/figures.html">Lattice manual</a> if you do data visualization in R.</p>
<p>It&#8217;s important to note here that Cleveland writes for researchers and decision-makers who use graphs to analyze data, or to convey scientific results to colleagues in an (ideally) objective manner. This distinguishes him from Darrell Huff, whose 1954 <a href="http://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728"><em>How to Lie with Statistics</em></a> considered the use of graphs (and statistics in general) as rhetorical devices for convincing others of one&#8217;s point of view. Hence, some of Cleveland&#8217;s recommendations and guidelines actually contradict Huff&#8217;s. <a id="refHuff" href="#Huff"><sup>1</sup></a></p>
<p>Edward Tufte also explored the idea that the choice of graphical display should be influenced by the viewer&#8217;s cognitive processes, in his 1990 book <a href="http://www.edwardtufte.com/tufte/books_ei"><em>Envisioning Information</em></a>. Tufte tends to be more broadly concerned with the gestalt of a graph, beyond its use as an analysis tool; he is also more concerned than Cleveland is with aesthetic considerations.</p>
<p>Cleveland&#8217;s philosophy might be summarized as: <em>minimize the mental gymnastics that the viewer must go through to understand the graph</em>. This leads to some obvious advice: avoid clutter and occlusion, make graphing symbols or color-coding unambiguous, use scale-lines on all four sides of the graph, and so on. It also leads to advice that perhaps should be as obvious, but isn&#8217;t: <em>make the aspect of the data that you want to analyze as clear as possible</em>. But what does this mean in practice?</p>
<p><strong>Make important differences large enough to perceive</strong></p>
<p>Weber&#8217;s Law is a well known observation from the psychophysics literature, which states that the &#8220;just noticeable&#8221; change in a stimulus is a constant ratio of the original stimulus. Put another way, people are only capable of detecting a change in a stimulus that is greater than a certain percentage <em>k</em> of the original stimulus. Here, &#8220;stimulus&#8221; can refer to any perceivable physical quantity: weight, intensity, length, orientation. The percentage <em>k</em> will vary with stimulus, and with observer.</p>
<table border="0" align="center">
<tbody>
<tr>
<td>
<div style="text-align:center;"><img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/weberslaw.jpg" border="0" alt="weberslaw.jpg" width="488" height="233" /></div>
</td>
</tr>
</tbody>
<caption>Figure 1: From Cleveland, <em>The Elements of Graphing Data</em></caption>
<tbody>
<tr>
<td></td>
</tr>
</tbody>
</table>
<p>Figure 1 shows the application of Weber&#8217;s law to lengths. The bars A and B are of different lengths, but the difference is such a small fraction of the &#8220;base&#8221; length (say, A&#8217;s length, to be specific) that is difficult to tell whether or not they are different, or which is longer. On the right, the bars have been embedded in frames of identical length, and now it is easy to see that B is longer. Why? Because the difference in lengths of the <em>white</em> intervals is a much larger percentage of the white &#8220;base&#8221; length (say the white A interval). It is easy to see that the white B interval is shorter than the white A interval, and therefore, the black B interval is longer than the black A interval.</p>
<p>The moral is that you always want the viewer to be estimating changes or differences with respect to a short base length. You can do this with reference grids, as demonstrated below.</p>
<table border="0" align="center">
<caption>From Cleveland, <em>The Elements of Graphing Data</em></caption>
<tbody>
<tr>
<td><!-- original 319 by 601 --><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/noreferencegrids.jpg" border="0" alt="noreferencegrids.jpg" width="200" height="400" align="left" /></td>
<td><!-- original 319 by 601 --><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/referencegrids1.jpg" border="0" alt="referencegrids.jpg" width="200" height="400" align="right" /></td>
</tr>
<tr>
<td align="center">Figure 2</td>
<td align="center">Figure 3</td>
</tr>
</tbody>
</table>
<p>Figure 2 shows eight curves. Which one dips to the lowest minimum? Are the high curves approaching the same value, and which one is rising the fastest? Are the low curves dipping to the same minimum? Are they going to the same steady state? Figure 3 shows the same curves, graphed with identical reference grids. The grids shorten the base lengths that are being compared, and it is now much easier to compare highs, lows, and steady state behavior.</p>
<p>But wouldn&#8217;t it be better to compare the graphs by superposing them? For two or three curves, perhaps. But in this case, eight curves can clutter the graph, and use up the symbol or color space, making it difficult to distinguish the different datasets &#8212; increasing the mental gymnastics.</p>
<p>Reference grids are useful even for a single curve, especially one with slowly varying segments, such as these graphs have. The reference grid makes it easier to answer questions like: does the process return to the initial state, or to a different steady state? Has the process reached steady state, or is it still growing?</p>
<p><strong>Make important shape changes large enough to perceive: Banking to 45 degrees.</strong></p>
<p>The aspect ratio of a graph is important when trying to understand shape. Rate of change information is encoded in the slope of the curve, which the viewer estimates by changes in the orientation of the local tangents at each point of the graph. Weber&#8217;s Law tells us that very small changes in this orientation will be difficult to detect. For a given (physical) curve, the local orientation changes will be dependent on the aspect ratio of its graphical presentation, as shown (to an exaggerated degree) in Figure 4. Here, the same curve (two line segments) is plotted at three different aspect ratios, one that centers the graph at 45 degrees, one that forces the curve to be nearly vertical, and another that forces it to be nearly horizontal. In the last two cases, the change in orientation of the two line segments is so small as to be nearly undetectable.</p>
<table border="0" align="center">
<caption>Figure 4: From Cleveland</caption>
<tbody>
<tr>
<td><!-- original 670 by 630 --></p>
<div style="text-align:center;"><img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/angles.jpg" border="0" alt="angles.jpg" width="446" height="420" align="left" /></div>
</td>
</tr>
</tbody>
</table>
<p>For two line segments with positive, unequal slopes, a simple geometric argument shows that their absolute difference in orientation is maximized by the aspect ratio that sets their average orientation to 45 degrees (the first graph in Figure 4). Empirical studies by Cleveland and others have indeed verified that a viewer&#8217;s ability to judge the relative slopes of line segments on a graph is maximized when the absolute values of the orientations of the segments are centered on 45 degrees.</p>
<p>This result leads to a technique called <em>Banking to 45</em>, whereby the aspect ratio of the graph is chosen so that the average slope of the entire graph is 45 degrees. The details are discussed in Cleveland, and many of the plots in R&#8217;s Lattice package also have an option to bank the graph to 45 degrees.</p>
<p>This deliberate exaggeration of slope is something that Darrell Huff deplores. In <em>How to Lie with Statistics</em>, Huff refers to these graphs as &#8220;gee-whiz&#8221; graphs — and in the context of his discussion of statistics as rhetoric, they are:</p>
<table border="0" align="center">
<caption>Figure 5: From Huff, <em>How to Lie With Statistics</em></caption>
<tbody>
<tr>
<td><!-- original 461 by 351 --></p>
<div style="text-align:center;"><img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/geewhiz.jpg" border="0" alt="geewhiz.jpg" width="461" height="351" /></div>
</td>
</tr>
</tbody>
</table>
<p>To insist that a graph should always include a zero line and that units be in proportion may be good advice from a rhetorical perspective; but it is poor advice if the purpose of the graph is data analysis. As Figure 6 below demonstrates, we can lose resolution if we always insist on including the zero. Does the trend line in the left graph increase linearly, superlinearly, or sublinearly? The convexity of the curve is more apparent when it is banked to 45, as on the right. Assuming that the scientist reads the axis and is cognizant of the actual magnitude changes involved, the graph on the right conveys more information.</p>
<table border="0" align="center">
<caption>Figure 6: From Cleveland</caption>
<tbody>
<tr>
<td><img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/bank451.jpg" border="0" alt="bank45.jpg" width="500"  /></td>
</tr>
</tbody>
</table>
<p><strong>Make sure all the data is equally well resolved.</strong></p>
<p>It is quite common for positive data —  word frequencies, populations, price distributions, just to name a few examples — to be skewed: most of the data is bunched towards low values, the rest of it is spread out on a very long tail. This long tail squashes the majority of the data into a tiny interval of a very narrow dynamic range, as in Figure 7, making it difficult to evaluate the data.</p>
<table border="0" align="center">
<tbody>
<tr>
<td>
<table border="0">
<tbody>
<tr>
<td><!-- original size: 990 by 860 --><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/skewed1.gif" border="0" alt="skewed.gif" width="250" /></td>
</tr>
</tbody>
<caption>Figure 7: Long-tailed distribution of purchase sizes</caption>
</table>
</td>
<td>
<table border="0">
<tbody>
<tr>
<td><!-- original = 499 by 675 --><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/logskewed1.gif" border="0" alt="logskewed.gif" width="250" /></td>
</tr>
</tbody>
<caption>Figure 8: Distribution of log(purchase size)</caption>
</table>
</td>
</tr>
</tbody>
</table>
<p>Imagine that Figure 7 represents the distribution of average purchase size across an online merchant&#8217;s customers: average purchase size is plotted on the x-axis, and the y-axis represents the fraction of the total customer population whose average purchase size is a given value (the area under the graph integrates to one). According to this graph, most customers make fairly small purchases on average, but there is a long tail of big spenders trailing out into the range of several thousand dollars. Obviously, one would like a little more resolution on the big spike of customers near zero. One could simply &#8220;zoom in&#8221; on this range, by chopping off some long chunk of the tail, but you may potentially lose sight of some global patterns in the data by doing so.</p>
<p>Graphing the distribution of log(purchase size) enables you to increase the resolution near zero, while preserving the global view. Figure 8 shows the distribution of log(purchase size), revealing two spending populations: a population of high spenders who tend to make purchases in the $3000 range (in log space), and another population whose purchases are centered (in log space) around $60. The existence of these two distinct populations is not apparent in the original graph.</p>
<p>Notice that Figure 8 has two x-axis scales: the top axis is marked in log units, while the bottom axis is marked in absolute dollars, spaced on a log scale. This accords with the principle of minimizing mental gymnastics, since the viewer of the graph will typically be concerned about prices in dollars, not log dollars. In fact, it would have been better yet to have plotted the distribution of log<sub>2</sub> or log<sub>10</sub> of the data; the former would allow us to see at a glance the doubling of price ranges, the latter to see price changes in factors of ten.</p>
<table border="0" align="center">
<caption>Figure 9: The 14 most abundant elements in meteorites. From Cleveland</caption>
<tbody>
<tr>
<td><!-- original = 543 by 522 --><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/metals.jpg" border="0" alt="metals.jpg" width="250" /></td>
<td><!-- original = 550 by 600 --><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/logmetals.jpg" border="0" alt="logmetals.jpg" width="250" /></td>
</tr>
</tbody>
</table>
<p>Figure 9 shows another example: the fourteen most abundant elements in meteorites, specifically the average percent of each of the elements. If we graph the percentages directly, as on the left, we cannot easily distinguish the differences in the elements from aluminum on down. Graphing log<sub>2</sub> of the percentages, as on the right, improves the resolution. Again, we have two x-axes on the graph of the log data.</p>
<p><strong>If you want to analyze the difference between two processes, then graph the difference, not the processes (or graph both).</strong></p>
<p>Suppose that we are comparing the two processes f1 and f2 that are shown in Figure 10. As x increases, the two processes appear to be approaching each other  — that is, the difference between the two seems to be decreasing. In reality, the difference between the two is constant: f2 = f1+1.</p>
<table border="0" align="center">
<tbody>
<tr>
<td>
<table border="0">
<tbody>
<tr>
<td><!-- original size: 990 by 860 --><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/difference1.gif" border="0" alt="difference.gif" width="250" /></td>
</tr>
</tbody>
<caption>Figure 10: The illusion of convergence</caption>
</table>
</td>
<td>
<table border="0">
<tbody>
<tr>
<td><!-- original = 499 by 675 --><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/imports.jpg" border="0" alt="imports.jpg" width="250" /></td>
</tr>
</tbody>
<caption>Figure 11: British Imports and Exports. From Cleveland</caption>
</table>
</td>
</tr>
</tbody>
</table>
<p>It turns out that people are good at perceiving the perpendicular difference between two curves, but not the differences in height, which is what we are actually interested in here. When we try to infer the differences from the process graph, we may not only miss key information, we may actually draw incorrect conclusions.</p>
<p>A less toy example is given in Figure 11. Here the imports to and exports from England are graphed over the first 80 years of the 18th century. In the difference graph on the bottom, we can see a local peak in (imports-exports) just after 1760; this is not obvious from simply comparing the two processes (top graph).</p>
<p><strong>If you are interested in rate of change, then graph rate of change.</strong></p>
<p>In Figure 12, we see the population figures for a given community from 1990 to 2009. Obviously, the population is steadily increasing, but how quickly? Is the rate of population growth increasing over time, or is it decreasing? If we are interested in these questions, then simply graphing the population over time is not enough. We need to look at the rate of change directly.</p>
<table border="0" align="center">
<tbody>
<tr>
<td>
<table border="0">
<caption>Figure 12</caption>
<tbody>
<tr>
<td><!-- original 998 by 860 --><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/rateofchange1.gif" border="0" alt="rateofchange.gif" width="250" /></td>
</tr>
</tbody>
</table>
</td>
<td>
<table border="0">
<caption>Figure 13</caption>
<tbody>
<tr>
<td><!-- original 720 by 720 --><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/lograteofchange2.gif" border="0" alt="lograteofchange.gif" width="250" /></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p>The classic way to do this is by graphing the logarithm of the data. In Figure 13, we have graphed log<sub>2</sub> of the population over time, with the log scale printed on the right hand y-axis, and the actual population numbers printed at a log scale on the left hand axis. Now we can see that the population increased at a constant rate from 1990 to 2000, quadrupling approximately every four years, and then slowed down (to a lower constant rate) after 2000.</p>
<p><strong>Graphs as a research tool</strong></p>
<p>Throughout this discussion, we have considered graphs as a tool for data exploration and initial understanding. It is an iterative process &#8212; as questions arise, the data will be reprocessed and re-plotted to highlight the new issues to be examined. A good research graph must display this information directly, with a minimum of mental gymnastics, but &#8212; as with any research tool &#8212; there can be a learning curve. For example, densityplots (such as those shown in Figures 7 and 8) are in my opinion more useful than histograms for understanding how numerical data is distributed &#8212; and I am constantly surprised at the amount of explanation that they require when I show them to people who are unfamiliar with them. A number of very useful graphs that are discussed in Cleveland&#8217;s texts meet with the same reaction from people who encounter that style of graph for the first time. This is a disadvantage, relative to using a more fashionable graph, when attempting to communicate results. But the insight into the data that these graphs provide often make it worth spending the time to educate clients or peers on how to read the graph.</p>
<p>Even so, a good graph still may not be a quick read. As Cleveland writes:</p>
<blockquote><p>While there is a place for rapidly-understood graphs, it is too limiting to make speed a requirement in science and technology, where the use of graphs ranges from detailed in-depth data analysis to quick presentation.<br />
&#8230;</p>
<p>The important criterion for a graph is not simply how fast we can see a result; rather it is whether through the use of the graph we can see something that would have been harder to see otherwise or that could not have been seen at all.</p></blockquote>
<p>- <em>The Elements of Graphing Data</em>, Chapter 2</p>
<hr /><a id="Huff" href="#refHuff">[Back]</a><sup>1</sup><em>How to Lie with Statistics</em> is an entertaining (if a little dated) discussion of how to read statistical and quantitative claims critically, and is definitely worth a read.</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/' rel='bookmark' title='Permanent Link: A Demonstration of Data Mining'>A Demonstration of Data Mining</a></li>
<li><a href='http://www.win-vector.com/blog/2010/02/living-in-a-lognormal-world/' rel='bookmark' title='Permanent Link: Living in A Lognormal World'>Living in A Lognormal World</a></li>
<li><a href='http://www.win-vector.com/blog/2009/04/the-data-enrichment-method/' rel='bookmark' title='Permanent Link: The Data Enrichment Method'>The Data Enrichment Method</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2009/08/good-graphs-graphical-perception-and-data-visualization/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>The Data Enrichment Method</title>
		<link>http://www.win-vector.com/blog/2009/04/the-data-enrichment-method/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=the-data-enrichment-method</link>
		<comments>http://www.win-vector.com/blog/2009/04/the-data-enrichment-method/#comments</comments>
		<pubDate>Fri, 01 May 2009 01:03:06 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Expository Writing]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Data Enrichment]]></category>
		<category><![CDATA[Mathematical Bedside Reading]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=80</guid>
		<description><![CDATA[We explore some of the ideas from the seminal paper &#8220;The Data-Enrichment Method&#8221; ( Henry R Lewis, Operations Research (1957) vol. 5 (4) pp. 1-5). The paper explains a technique of improving the quality of statistical inference by increasing the effective size of the data-set. This is called &#8220;Data-Enrichment.&#8221; Now more than ever we must [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/' rel='bookmark' title='Permanent Link: A Demonstration of Data Mining'>A Demonstration of Data Mining</a></li>
<li><a href='http://www.win-vector.com/blog/2009/08/good-graphs-graphical-perception-and-data-visualization/' rel='bookmark' title='Permanent Link: Good Graphs: Graphical Perception and Data Visualization'>Good Graphs: Graphical Perception and Data Visualization</a></li>
<li><a href='http://www.win-vector.com/blog/2009/01/exciting-technique-1-the-r-language/' rel='bookmark' title='Permanent Link: Exciting Technique #1: The &#8220;R&#8221; language.'>Exciting Technique #1: The &#8220;R&#8221; language.</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>We explore some of the ideas from the seminal paper &#8220;The Data-Enrichment Method&#8221; ( Henry R Lewis, Operations Research (1957) vol. 5 (4) pp. 1-5).  The paper explains a technique of improving the quality of statistical inference by increasing the effective size of the data-set.  This is called &#8220;Data-Enrichment.&#8221;</p>
<p>Now more than ever we must be familiar with the consequences of these important techniques.  Especially if we don&#8217;t know if we might already be a victim of them.</p>
<p><span id="more-80"></span><br />
&#8220;The Data-Enrichment Method&#8221; is an absolutely wonderful 1957 tongue in cheek parody of a very tempting method of accidental data falsification.  The method presented is spookily plausible and actually anticipates some very important (and correct) methods later used in the EM, Jackknife, Bootstrap and other resampling techniques (for example see: &#8220;Bootstrap Methods: Another Look at the Jackknife&#8221;, Bradley Efron. Ann. Statist. (1979) vol. 7 (1) pp. 1-26).</p>
<p>The idea is innocently presented with an accompanying data-set: perception of a sound at a different presented decibel levels (loudnesses):</p>
<p><center></p>
<table>
<tr>
<th>Source.DB</th>
<th>Detections</th>
<th>Failures</th>
</tr>
<tr>
<td>62</td>
<td>5</td>
<td>40</td>
</tr>
<tr>
<td>65</td>
<td>10</td>
<td>30</td>
</tr>
<tr>
<td>68</td>
<td>15</td>
<td>20</td>
</tr>
<tr>
<td>71</td>
<td>20</td>
<td>10</td>
</tr>
<tr>
<td>74</td>
<td>25</td>
<td>5</td>
</tr>
<tr>
<td>77</td>
<td>30</td>
<td>3</td>
</tr>
</table>
<p></center></p>
<p>From this table it is obvious that the number of detections is increasing (and the number of failures is decreasing) as the sound is presented louder and louder.  This makes sense and puts a quantitative rate to our prior expectation that detection gets easier as loudness increases.  For this data the trend is quite obvious and we can easily plot a regression line that accurately models the effect of Source.DB on detection rate:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/04/sourcedbdetectionrate.gif" alt="SourceDBDetectionRate.gif" border="0" width="400" height="400" /><br />
</center></p>
<p>But we want more.  Can we increase our model precision and confidence by incorporating our domain knowledge?  If we are only trying to accurately estimate the rate that loudness increases the detection level and we are willing to assume that it really does increase, then: could we not pre-prepare the data to use our domain knowledge? </p>
<p>The method suggested is to add in some contra-factuals that we feel confident about.  For example we could (using our assumption that loudness increases detection, just to an unknown degree) notice that the 30 failures at 65 DB certainly would not have been heard if they had been run at 62 DB (even quieter).  By the same reasoning we can assume that the 5 detections at 62 DB would have been heard had they been run at 65 DB, 68 DB, 71 DB, 74 Db or 77 DB.  In this way we have used our starting &#8220;seed data&#8221; and our domain knowledge to boost into a much larger data set that shows the expected relation much more strongly.</p>
<p>The above paragraph is, of course, nonsense.  I am doing the original paper an injustice by summarizing- because in the original paper the procedure seems perfectly plausible (and useful).  It is not until the author works a second example that has a poor initial relation (that actually needs the enrichment) that the joke is revealed.</p>
<p>The second example is coin flipping.  The author applies an inductive bias that &#8220;clearly standing higher up on a staircase increases the chances of a coin flip coming up heads&#8221; and then uses the data enrichment method to enhance the data set.  The original data set is indeed too noisy to show the effect and the enhancement is in fact quite dramatic.  The original data:</p>
<p><center></p>
<table>
<tr>
<th>Stair.Step</th>
<th>Heads</th>
<th>Tails</th>
</tr>
<tr>
<td> 1</td>
<td>4</td>
<td>6</td>
</tr>
<tr>
<td> 2</td>
<td>5</td>
<td>5</td>
</tr>
<tr>
<td> 3</td>
<td>7</td>
<td>3</td>
</tr>
<tr>
<td> 4</td>
<td>4</td>
<td>6</td>
</tr>
<tr>
<td> 5</td>
<td>6</td>
<td>4</td>
</tr>
<tr>
<td> 6</td>
<td>5</td>
<td>5</td>
</tr>
<tr>
<td> 7</td>
<td>6</td>
<td>4</td>
</tr>
<tr>
<td> 8</td>
<td>6</td>
<td>4</td>
</tr>
<tr>
<td> 9</td>
<td>3</td>
<td>7</td>
</tr>
<tr>
<td> 10</td>
<td>4</td>
<td>6</td>
</tr>
</table>
<p></center></p>
<p>The enhanced data is much more interesting:</p>
<p><center></p>
<table>
<tr>
<th>Stair.Step</th>
<th>Virtual.Heads</th>
<th>Virtual.Tails</th>
</tr>
<tr>
<td>1</td>
<td>  4</td>
<td> 50</td>
</tr>
<tr>
<td>2</td>
<td>  9</td>
<td> 44</td>
</tr>
<tr>
<td>3</td>
<td> 16</td>
<td> 39</td>
</tr>
<tr>
<td>4</td>
<td> 20</td>
<td> 36</td>
</tr>
<tr>
<td>5</td>
<td> 26</td>
<td> 30</td>
</tr>
<tr>
<td>6</td>
<td> 31</td>
<td> 26</td>
</tr>
<tr>
<td>7</td>
<td> 37</td>
<td> 21</td>
</tr>
<tr>
<td>8</td>
<td> 43</td>
<td> 17</td>
</tr>
<tr>
<td>9</td>
<td> 46</td>
<td> 13</td>
</tr>
<tr>
<td>10</td>
<td> 50</td>
<td>  6</td>
</tr>
</table>
<p></center></p>
<p>It is easier to see what is going on in the following plots (which show measured success rates as a function of number of stairs up the staircase and show a smoothed fit of the relationship).  The original data is a noisy mess:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/04/coinsmoothed.gif" alt="CoinSmoothed.gif" border="0" width="400" height="400" /><br />
</center></p>
<p>And the enriched data is more trend-like:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/04/virtualsmoothed.gif" alt="VirtualSmoothed.gif" border="0" width="400" height="400" /><br />
</center></p>
<p>In fact the regression line fit onto the raw data even has the wrong sign (points down instead of up):</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/04/coinfit.gif" alt="CoinFit.gif" border="0" width="400" height="400" /><br />
</center></p>
<p>Now, obviously this is a joke.  The enhancement procedure did not so much enhance the data as obliterate it.  The procedure makes no sense and it is treating the procedure with undue respect to point out any one feature as being &#8220;what is wrong with it.&#8221;  But the original desire is legitimate: can we use informed assumptions to gain a useful inductive bias?  If we do know something should we not need less data?</p>
<p>The answer is yes- but we have to be careful.  We must read up on the differences between Bayesian, frequentist and empirical methods and decide which set of methods is best for us.  Up until now we have been fitting &#8220;by standard methods&#8221; which is really just minimizing how far the data is from the model (by moving the model around).  That isn&#8217;t the only way to fit (see: &#8220;Controversies In The Foundation Of Statistics&#8221; Bradley Efron, American Mathematical Monthly (1978) vol. 85 (4) pp. 231-246).</p>
<p>For example a Bayesian might say that the goal of model fitting is not to pick a model that is closest to the data (maximizes the data&#8217;s plausibility with respect to the model) but to pick a model that simultaneously maximizes the product of the data&#8217;s plausibility with respect to the model and the model&#8217;s acceptability.  For example we could say all models for coin-flips with negative slopes are unacceptable and pick the best model with a non-negative slope.  However, assigning of degrees of acceptability (or priors) on every possible model is laborious and may require more knowledge than we have from our &#8220;reasonable prior domain knowledge.&#8221;</p>
<p>Another method is to use more sophisticated notions.  One such method is Quantile Regression ( Roger Koenker, Cambridge University Press 2005).  This methodology treats regression as a constrained optimization problem- so it is a simple matter to add in more constraints (like the slope must be positive) without having to assign arbitrary plausibilities to every possible model.  Another (huge) advantage is that Quantile Regression is much more stable and even without any entered constraints recognizes that the coin-flip data is likely trend free.  Here we plot the Quantile Regression analysis of the coin-data (without having added any prior constraints):</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/04/quantileregression.gif" alt="QuantileRegression.gif" border="0" width="400" height="400" /><br />
</center></p>
<p>To be honest: the method got lucky- the fit is better than should be expected.  But Quantile Regression is the perfect framework for adding in domain-constraints.</p>
<p>So: while The Data Enrichment Method is a fraud, there are ways to to enhance analysis to incorporate domain knowledge into results.  Instead of saying &#8220;any bias (even useful bias) ruins fitting&#8221; one should have a cookbook of methods ready to be applied.  These cookbooks hide under names like &#8220;Econometric Society Monographs&#8221; (in my opinion the econometricians really own the interface between theoretical statistics and hard-nosed applications).</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/' rel='bookmark' title='Permanent Link: A Demonstration of Data Mining'>A Demonstration of Data Mining</a></li>
<li><a href='http://www.win-vector.com/blog/2009/08/good-graphs-graphical-perception-and-data-visualization/' rel='bookmark' title='Permanent Link: Good Graphs: Graphical Perception and Data Visualization'>Good Graphs: Graphical Perception and Data Visualization</a></li>
<li><a href='http://www.win-vector.com/blog/2009/01/exciting-technique-1-the-r-language/' rel='bookmark' title='Permanent Link: Exciting Technique #1: The &#8220;R&#8221; language.'>Exciting Technique #1: The &#8220;R&#8221; language.</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2009/04/the-data-enrichment-method/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A Quick Appreciation of the Sharpe Ratio</title>
		<link>http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=a-quick-appreciation-of-the-sharpe-ratio</link>
		<comments>http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/#comments</comments>
		<pubDate>Wed, 01 Oct 2008 03:15:07 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Expository Writing]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Mathematical Bedside Reading]]></category>
		<category><![CDATA[Sharpe Ratio]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=22</guid>
		<description><![CDATA[The current state of the global financial markets has gotten more people than usual worrying about the technical aspects of finance. One method for reasoning about investment returns and risk is a tool called the Sharpe Ratio. It is well worth reviewing this measure and seeing how, if used properly, it doesn&#8217;t favor any of [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/' rel='bookmark' title='Permanent Link: &#8220;Easy&#8221; Portfolio Allocation'>&#8220;Easy&#8221; Portfolio Allocation</a></li>
<li><a href='http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/' rel='bookmark' title='Permanent Link: What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?'>What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</a></li>
<li><a href='http://www.win-vector.com/blog/2009/03/what-does-the-market-think/' rel='bookmark' title='Permanent Link: What does the Market Think?'>What does the Market Think?</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>The current state of the global financial markets has gotten more people than usual worrying about the technical aspects of finance.  One method for reasoning about investment returns and risk is a tool called the Sharpe Ratio.  It is well worth reviewing this measure and seeing how, if used properly, it doesn&#8217;t favor any of the mistakes that underly our current financial crisis.<span id="more-22"></span></p>
<p>The Sharpe ratio is a famous measure of &#8220;risk adjusted return&#8221; and is defined as &#8220;the ratio of the expected excess return from an investment divided by standard deviation of the excess return.&#8221;  It is most easily demonstrated by an example (which we work in pieces).</p>
<p>If an investment is expected to generate a profit of 15% in the next year and an insured bank account would generate 10% profit then the expected excess return invested is 15% &#8211; 10% = 5%.  A rational investor would never take a risky investment that did not have a positive excess return (else they would expect to make more money at a bank). &#8220;Expected&#8221; is a technical term which means the average return of the investment averaged over all possible outcomes (weighted by the odds of each outcome), we can explain this by working a couple of examples.</p>
<p>Consider investment &#8220;A&#8221; which is a generally good idea that returns a 20% profit in half the possible years and a 10% profit in the other half of the years.  Investment A has an expected return of 0.5*20% + 0.5*10% = 15%.  Investment &#8220;A&#8221; has 15% &#8211; 10% = 5% excess return.</p>
<p>Also consider another investment &#8220;B&#8221; which is a risky bet that returns 20% profit most years (around 95.8% of them) and goes bankrupt in the other years.  The expected return of investment &#8220;B&#8221; is 0.958*20% + 0.042*(-100%) = 14.96%, or essentially 15%.   Investment &#8220;B&#8221; has 15% &#8211; 10% = 5% excess return.</p>
<p>As we can see &#8220;expectation&#8221; alone can not really tell these two investments apart.  That is why the second component of the Sharpe ratio is something called the standard deviation.  The standard deviation is defined as the square-root of the squared deviations of the return from the target value of 15%.  What we do is measure for each possible outcome how far off the return is from the target of 15%, multiply this number by itself (called squaring it) and then take the square-root of the sum of all such values.  Again, this is best explained by an example.</p>
<p>Investment &#8220;A&#8221; has a standard deviation of:<br />
square-root(  0.5 * (20% &#8211; 15%)*(20% &#8211; 15%) +  0.5 * (10% &#8211; 15%)*(10% &#8211; 15%)  ) = 5%</p>
<p>And investment &#8220;B&#8221; has a standard deviation of:<br />
square-root( 0.958 *( 20% &#8211; 15%)*( 20% &#8211; 15%) + 0.042*(-100% &#8211; 15%)*(-100% &#8211; 15%) ) = 24%</p>
<p>Just like in the calculation of expectation we are taking every possible situation and summing (weighted by the likelihood) our value of interest (in this case the squared variation).</p>
<p>The standard deviation&#8217;s opinion is that investment &#8220;B&#8221; is about five times riskier than investment &#8220;A.&#8221;  And this is the grace of the Sharpe ratio: it says that investment &#8220;A&#8221;&#8216;s value is (15% &#8211; 10%)/5% =  1 and &#8220;B&#8221;&#8216;s value is (15% &#8211; 10%)/24% = 0.2.</p>
<p>An interesting feature of the Sharpe ratio is that, unlike Wall Street, it does not believe that leveraging increases profitability.  A common desperation move is to take an investment that has a moderate return and borrow money to simulate larger returns by having larger exposure.  For instance an investment that returns 15% can try to simulate a higher return by borrowing.   If for every $1,000 invested we borrow another $1,000 to invest (paying the risk rate of 10% for the money) one can show an apparent rate of return of ($2000*15% &#8211; $1000*10%)/$1000 or 20%.  However, this is not free money- the investor is taking on twice as much risk for only half as much more return.  In fact with sufficient leverage (three times, for times, thirty times) one can convert a safe investment into a risky investment that could even go bankrupt.  The Sharpe ratio (by design) is not fooled by this sort of manipulation.  Investing $1000 in investment A has the exact same Sharpe ratio as investing $1000 plus $1000 more borrowed at the risk-free rate (this is part of the cleverness of using excess returns instead of un-adjusted returns).</p>
<p>Unfortunately to use the Sharpe ratio you need good estimates of three things:</p>
<p>1) The expected return of the investment.</p>
<p>2) The risk-less available in the market (to compute excess).</p>
<p>3) The standard deviation of the investment.</p>
<p>All three of these facts are about the future, so we don&#8217;t really know any of them.  The historic returns of an investment are not the same thing as the expected returns in the future, interest rates can change and the standard deviation is especially hard to estimate.  However, if you have a model (or at least a theory) of what your investments are supposed to do then you can plug in estimates for these three quantities and use the Sharpe ratio to determine which investments really are best.</p>
<p>If you knew how investment &#8220;A&#8221; worked and could estimate that it returned 20% about half the time and 10% the other times you could estimate its Sharpe ratio as 1.  And if you knew investment &#8220;B&#8221; was a gamble that almost always paid off at 20% with a single rare event that causes bankruptcy you could estimate its Sharpe ratio as 0.2.  Even if your estimates were inaccurate (say you estimate investment &#8220;A&#8221;&#8216;s Sharpe ratio is 0.7 and investment &#8220;B&#8221;&#8216;s Sharpe ratio as 0.3) the indication is to stay away from investment &#8220;B.&#8221;</p>
<p>This is in stark contrast to the conclusion you would draw if you thought of these investments as a &#8220;black box&#8221; (like a fund of funds does) and looked only at their historic performance.  If you looked at around 5 years of historic performance of both investments you would (incorrectly) think the following:</p>
<p>Investment A looks kind of noisy, some years it returns 10% and some years it return 20%.  You would estimate (correctly) the return as averaging to 15% and you can even get a historic estimate of its standard deviation that is actually about right (5%)</p>
<p>Investment B looks like easy money.  With about 80% chance you would not have seen a bankruptcy, just 5 years of 20% returns.  You would mis-estimate the return as being 20% (all you have ever seen) and further mis-estimate the standard deviation as 0%.</p>
<p>Based on historic data alone you would fire the manager of investment &#8220;A&#8221;, give the manager of investment &#8220;B&#8221; a huge bonus and invest all of your money.  And a few years later you would go bankrupt.</p>
<p>What is going on is very well explained by Nassim Nicholas Taleb as &#8220;the turkey paradox.&#8221;  Domestic turkeys are all killed at about the exact same age (say 60 days).  For somebody that understands commercial poultry farming there is not any mystery or uncertainty about it.  60 days before you want to sell a turkey carcass you buy a turkey chick.  There is an inevitability and reverse causality- the desire for the turkey&#8217;s carcass funds and causes the turkey&#8217;s start of life 60 days earlier.  Now if the turkey is a statistical empiricist (perhaps with a PhD in machine learning) things look good.  The turkey sets up a model of each day having an unknown chance of being good or bad.  The turkey figures that each day&#8217;s outcome is an independent trial drawn from this single unknown probability.  The turkey collects evidence: every day it gets fed.  Each day is more evidence that all days will be good.  And then on day 60 the turkey gets a nasty surprise.  The turkey&#8217;s life was a bad investment from day one, all of the &#8220;evidence&#8221; the turkey collects along the way was irrelevant because the model was wrong.  And the model was wrong because the turkey guessed at the model instead of investigating the nature of poultry farming.</p>
<p>Much is the same in many investments.  There are investments that look like investment &#8220;B&#8221; when you open the hood.  Many of them involve writing &#8220;out of the money options&#8221; and &#8220;default swaps.&#8221;  These are essentially selling insurance on events that nobody thinks are likely.  Selling insurance that usually is not used is profitable, until the insurance gets used.   This is why insurance companies (if they are ethical) don&#8217;t treat the entirety of collected payments as profit- but as a stockpile that must be kept to pay the claims that will inevitably some day come true.</p>
<p>It is important to point out the Sharpe ratio will give you incorrect results if you plug bad estimates into it.  Overall the Sharpe ratio prefers good investments and diversification but it can be led astray.  In fact that is the whole point: no amount of smart math will undo the inevitable consequences of wrong models that are used because &#8220;you need something you can solve&#8221; (like the turkey) or &#8220;everybody else is getting rich using them&#8221; (like investment &#8220;B&#8221;).</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/' rel='bookmark' title='Permanent Link: &#8220;Easy&#8221; Portfolio Allocation'>&#8220;Easy&#8221; Portfolio Allocation</a></li>
<li><a href='http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/' rel='bookmark' title='Permanent Link: What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?'>What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</a></li>
<li><a href='http://www.win-vector.com/blog/2009/03/what-does-the-market-think/' rel='bookmark' title='Permanent Link: What does the Market Think?'>What does the Market Think?</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
