<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Win-Vector Blog &#187; John Mount</title>
	<atom:link href="http://www.win-vector.com/blog/author/john-mount/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.win-vector.com/blog</link>
	<description>The Applied Theorist&#039;s Point of View</description>
	<lastBuildDate>Thu, 29 Jul 2010 17:09:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Gradients via Reverse Accumulation</title>
		<link>http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=gradients-via-reverse-accumulation</link>
		<comments>http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/#comments</comments>
		<pubDate>Thu, 15 Jul 2010 00:00:04 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[Exciting Techniques]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[Automatic Differentiation]]></category>
		<category><![CDATA[Conjugate Gradient]]></category>
		<category><![CDATA[Gradient]]></category>
		<category><![CDATA[Mathematical Bedside Reading]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Reverse Accumulation]]></category>
		<category><![CDATA[Scala]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1493</guid>
		<description><![CDATA[We extend the ideas of from Automatic Differentiation with Scala to include the reverse accumulation. Reverse accumulation is a non-obvious improvement to automatic differentiation that can in many cases vastly speed up calculations of gradients. As the tables, diagrams and equations do not translate well into HTML, our full article is available here in PDF: [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/' rel='bookmark' title='Permanent Link: Automatic Differentiation with Scala'>Automatic Differentiation with Scala</a></li>
<li><a href='http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/' rel='bookmark' title='Permanent Link: &#8220;Easy&#8221; Portfolio Allocation'>&#8220;Easy&#8221; Portfolio Allocation</a></li>
<li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>We extend the ideas of from <a target="ext" href="http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/">Automatic Differentiation with Scala</a> to include the <em>reverse accumulation</em>.  Reverse accumulation is a non-obvious improvement to automatic differentiation that can in many cases vastly speed up calculations of gradients.<span id="more-1493"></span><br />
As the tables, diagrams and equations do not translate well into HTML, our full article is available here in PDF: <a href="http://www.win-vector.com/dfiles/ReverseAccumulation.pdf">http://www.win-vector.com/dfiles/ReverseAccumulation.pdf</a>.</p>
<p>The purpose of our article is to explain reverse accumulation automatic differentiation clearly (and to release some sample code and timing results).  A side effect of the article is to make sense of the following two diagrams:</p>
<p>If the following is picture of standard or forward differentiation:</p>
<p><img src="http://www.win-vector.com/blog/wp-content/uploads/2010/07/cutFwd.png" alt="cutFwd.png" border="0" width="408" height="677" /></p>
<p>then the following is a picture of reverse accumulation:</p>
<p><img src="http://www.win-vector.com/blog/wp-content/uploads/2010/07/cutRev.png" alt="cutRev.png" border="0" width="487" height="739" /></p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/' rel='bookmark' title='Permanent Link: Automatic Differentiation with Scala'>Automatic Differentiation with Scala</a></li>
<li><a href='http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/' rel='bookmark' title='Permanent Link: &#8220;Easy&#8221; Portfolio Allocation'>&#8220;Easy&#8221; Portfolio Allocation</a></li>
<li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automatic Differentiation with Scala</title>
		<link>http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=automatic-differentiation-with-scala</link>
		<comments>http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/#comments</comments>
		<pubDate>Tue, 15 Jun 2010 04:19:20 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Exciting Techniques]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[Automatic Differentiation]]></category>
		<category><![CDATA[Conjugate Gradient]]></category>
		<category><![CDATA[Dual Numbers]]></category>
		<category><![CDATA[Geometric Median]]></category>
		<category><![CDATA[Numeric Methods]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Scala]]></category>
		<category><![CDATA[Steiner Tree]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1481</guid>
		<description><![CDATA[This article is a worked-out exercise in applying the Scala type system to solve a small scale optimization problem. For this article we supply complete Scala source code (under a GPLv3 license) and some design discussion. Usually we work using a combination of databases, Java, optimization libraries and analysis suites (like R). The reason is [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/' rel='bookmark' title='Permanent Link: Gradients via Reverse Accumulation'>Gradients via Reverse Accumulation</a></li>
<li><a href='http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/' rel='bookmark' title='Permanent Link: R examine objects tutorial'>R examine objects tutorial</a></li>
<li><a href='http://www.win-vector.com/blog/2009/09/survive-r/' rel='bookmark' title='Permanent Link: Survive R'>Survive R</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>This article is a worked-out exercise in applying the <a href="http://www.scala-lang.org/" target="ext">Scala</a> type system to solve a small scale optimization problem.    For this article we supply <a href="http://www.win-vector.com/dfiles/ScalaDiff.jar">complete Scala source code</a> (under a GPLv3 license) and some design discussion.<span id="more-1481"></span><br />
Usually we work using a combination of databases, Java, optimization libraries and analysis suites (like R).  The reason is that, for our typical problems, Java hits a sweet spot of trading off runtime performance against ease of development and maintenance.  In the tens of gigabytes range (data sets larger than the Wikipedia but smaller than the Web) Java outperforms the scripting languages (Ruby, Python &#8230;) and is much easer to develop in and document than C++.  This sweet spot is both subjective and situational- if the tasks were smaller and in a services framework Python is a better choice, if performance is paramount then C or C++ (with the STL) and Hadoop are a better choice, if pre-built statistical libraries are needed then R becomes a better choice.  For the type problem we present here Scala is a very good choice.</p>
<style type="text/css">
td.linenos { background-color: #f0f0f0; padding-right: 10px; }
span.lineno { background-color: #f0f0f0; padding: 0 5px 0 5px; }
pre { line-height: 125%; }
body .hll { background-color: #ffffcc }
body  { background: #f8f8f8; }
body .c { color: #408080; font-style: italic } /* Comment */
body .err { border: 1px solid #FF0000 } /* Error */
body .k { color: #008000; font-weight: bold } /* Keyword */
body .o { color: #666666 } /* Operator */
body .cm { color: #408080; font-style: italic } /* Comment.Multiline */
body .cp { color: #BC7A00 } /* Comment.Preproc */
body .c1 { color: #408080; font-style: italic } /* Comment.Single */
body .cs { color: #408080; font-style: italic } /* Comment.Special */
body .gd { color: #A00000 } /* Generic.Deleted */
body .ge { font-style: italic } /* Generic.Emph */
body .gr { color: #FF0000 } /* Generic.Error */
body .gh { color: #000080; font-weight: bold } /* Generic.Heading */
body .gi { color: #00A000 } /* Generic.Inserted */
body .go { color: #808080 } /* Generic.Output */
body .gp { color: #000080; font-weight: bold } /* Generic.Prompt */
body .gs { font-weight: bold } /* Generic.Strong */
body .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
body .gt { color: #0040D0 } /* Generic.Traceback */
body .kc { color: #008000; font-weight: bold } /* Keyword.Constant */
body .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */
body .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */
body .kp { color: #008000 } /* Keyword.Pseudo */
body .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */
body .kt { color: #B00040 } /* Keyword.Type */
body .m { color: #666666 } /* Literal.Number */
body .s { color: #BA2121 } /* Literal.String */
body .na { color: #7D9029 } /* Name.Attribute */
body .nb { color: #008000 } /* Name.Builtin */
body .nc { color: #0000FF; font-weight: bold } /* Name.Class */
body .no { color: #880000 } /* Name.Constant */
body .nd { color: #AA22FF } /* Name.Decorator */
body .ni { color: #999999; font-weight: bold } /* Name.Entity */
body .ne { color: #D2413A; font-weight: bold } /* Name.Exception */
body .nf { color: #0000FF } /* Name.Function */
body .nl { color: #A0A000 } /* Name.Label */
body .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
body .nt { color: #008000; font-weight: bold } /* Name.Tag */
body .nv { color: #19177C } /* Name.Variable */
body .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */
body .w { color: #bbbbbb } /* Text.Whitespace */
body .mf { color: #666666 } /* Literal.Number.Float */
body .mh { color: #666666 } /* Literal.Number.Hex */
body .mi { color: #666666 } /* Literal.Number.Integer */
body .mo { color: #666666 } /* Literal.Number.Oct */
body .sb { color: #BA2121 } /* Literal.String.Backtick */
body .sc { color: #BA2121 } /* Literal.String.Char */
body .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */
body .s2 { color: #BA2121 } /* Literal.String.Double */
body .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
body .sh { color: #BA2121 } /* Literal.String.Heredoc */
body .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */
body .sx { color: #008000 } /* Literal.String.Other */
body .sr { color: #BB6688 } /* Literal.String.Regex */
body .s1 { color: #BA2121 } /* Literal.String.Single */
body .ss { color: #19177C } /* Literal.String.Symbol */
body .bp { color: #008000 } /* Name.Builtin.Pseudo */
body .vc { color: #19177C } /* Name.Variable.Class */
body .vg { color: #19177C } /* Name.Variable.Global */
body .vi { color: #19177C } /* Name.Variable.Instance */
body .il { color: #666666 } /* Literal.Number.Integer.Long */
 </style>
<h2>Our Example Problem</h2>
<p>Our small scale problem is this:  we have a number of target points on a map and we want to pick a central point to <em>directly</em> connect to all of these points with wire.  Our goal is to minimize the total amount of wire used.  This problem is called the <a href="http://en.wikipedia.org/wiki/Geometric_median" ref="ext">&#8220;Geometric Median&#8221;</a>.  So we are trying to find a point that minimizes the sum of distances from our chosen center to every target point. If we were trying to minimize the sum of squared distances from our chosen center to every target point the answer would be obvious: the average or mean (which by Hooke&#8217;s law is also the point where a set of identical springs would relax to).  The mean is in fact a fairly good guess, but you can do better (which could important if the &#8220;wire&#8221; is expensive, such as cutting irrigation or drainage ditches).  For example given the three target points (20,0), (-1,-1) and (-1,1) the optimal point is (-0.42,0) not the mean (6,0) and the choice of optimal point represents an over 19% savings in total wiring distance (see figure).</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/points.png" alt="points.png" border="0" width="525" height="525" /><br />
</center></p>
<p>This is a substantial saving in cost.  </p>
<p>The problem changes as we consider variations.  If indirect connections (such as routing one point through another, which may or may not be possible for reasons of capacity or safety) and multiple new centers are allowed  we then have an instance of the <a href="http://en.wikipedia.org/wiki/Steiner_tree_problem" ref="ext">Steiner Tree Problem</a> which is harder  to solve (since it is known to be NP complete).  If no new centers are allowed (all routing must be between pre-existing target points) then we have a Spanning Tree Problem- which admits very quick solutions.</p>
<p>We bring up the geometric median as a mere example.  We don&#8217;t intend for our code to solve only the geometric median problem and we don&#8217;t intend to touch on the literature of specialized methods for solving the geometric median problem.  Instead we are trying to demonstrate the speed you can develop prototype solutions if you have a few good tools (like various optimizers) available in your toolkit.  Numeric optimizers may sound exotic, but they often are the kind of thing you want to experiment with and link directly into your code.</p>
<h2>Optimization as General Tool</h2>
<p>Now that we have the example problem we can describe a solution strategy.  In this case the solution uses code &#8220;we wished we had lying around&#8221; before we started on the problem.  We will pretend we have the tools we want ready to solve our problem and then we will pay our debt and build the required tools.  The issue is that there is not an obvious closed form for the solution of the geometric median problem.  So we are forced to work a bit harder.  In this case harder means we need to solve an optimization problem.  Consider the contour plot of the total wiring cost as function of where we choose to place our center.  Our optimal point (-0.42,0) had wiring cost of 22.73 and the contour plot given here shows concentric regions of solution positions with higher cost.</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/contour.png" alt="contour.png" border="0" width="525" height="525" /><br />
</center></p>
<p>In general it is unwise to throw an optimizer at an arbitrary problem and hope to find the globally best solution.  But in this case (and in many similar situations) we can prove that a simple local optimizer will in fact find the unique best solution.  This is a property of the problem not of the optimizer.  The concentric regions shown in the contour plot have a very nice shape: they are <a href="http://en.wikipedia.org/wiki/Convex_set" ref="ext">convex</a>.   That is: they have no intrusions- for any two points drawn from one of these shapes the straight line segment between these points stays inside the given shape.  We don&#8217;t have to depend on observation- we can actually prove this is always the case for this problem.  The wiring cost from a proposed center to any single target point is a <a href="http://en.wikipedia.org/wiki/Convex_function" ref="ext">convex function</a> of where we choose to place our center (a convex function is a function whose graph never reaches above the secant line drawn between any two points on its graph).  The total wiring cost is just the sum of the wiring costs to each target point.  And to finish: the sum of a collection of convex functions is itself a convex function.  Since the contour plot of a convex function has only convex shapes and we have proven the statement.</p>
<p>But how does this help us?  There is a standard technique to find &#8220;local minima&#8221; of a function by inspecting a function for places where the gradient is zero (points where there is no obvious down hill direction on the contour plot).  This technique usually can only be guaranteed to find local minima (places where no small change improves your situation).  But there is no guarantee that the local minimum you find is in fact the global minimum (the best possible solution).  Except when you are dealing with a convex function.  When a function is convex then all of the local minima are always grouped together into a single convex connected shape (if not a line drawn between two remote minima would violate the convexity definition).  And if the function is never flat then this set is a single unique point: the unique best solution.  Our inspection technique will be a gradient driven optimizer- that is an optimizer that when the gradient is non-zero improves its objective by running down hill and halts when the gradient is zero.</p>
<p>The stated function to minimize is to sum the distance from our proposed center to each target point.  We can write this as the sum of the distances:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/dist1.png" alt="dist1.png" border="0" width="309" height="81" /><br />
</center></p>
<p>( <img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/euclid1.png" alt="euclid1.png" border="0" width="119" height="37" /> which is the traditional Euclidean or L2 distance).  This function actually has one one subtle flaw that we will deal with in the appendix (see: Fixing Smoothness).</p>
<h2>Using Scala to Apply the Optimization Solution</h2>
<p>To find our optimal center placement using Scala we first write our cost or objective as a Scala function:</p>
<div class="highlight">
<pre>    <span class="k">val</span> <span class="n">dat</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">]]</span> <span class="o">=</span> <span class="nc">Array</span><span class="o">(</span>
      <span class="nc">Array</span><span class="o">(</span> <span class="mi">20</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
      <span class="nc">Array</span><span class="o">(</span> <span class="o">-</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
      <span class="nc">Array</span><span class="o">(</span> <span class="o">-</span><span class="mf">1.0</span><span class="o">,</span> <span class="o">-</span><span class="mf">1.0</span><span class="o">)</span>
    <span class="o">)</span>

    <span class="k">def</span> <span class="n">fx</span><span class="o">(</span><span class="n">p</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">])</span><span class="k">:</span><span class="kt">Double</span> <span class="o">=</span> <span class="o">{</span>
      <span class="k">val</span> <span class="n">dim</span> <span class="k">=</span> <span class="n">p</span><span class="o">.</span><span class="n">length</span>
      <span class="k">val</span> <span class="n">npoint</span> <span class="k">=</span> <span class="n">dat</span><span class="o">.</span><span class="n">length</span>
      <span class="k">var</span> <span class="n">total</span> <span class="k">=</span> <span class="mf">0.0</span>
      <span class="k">for</span><span class="o">(</span><span class="n">k</span> <span class="k">&lt;-</span> <span class="mi">0</span> <span class="n">to</span> <span class="o">(</span><span class="n">npoint</span><span class="o">-</span><span class="mi">1</span><span class="o">))</span> <span class="o">{</span>
        <span class="k">var</span> <span class="n">term</span> <span class="k">=</span> <span class="mf">0.0</span>
        <span class="k">for</span><span class="o">(</span><span class="n">i</span> <span class="k">&lt;-</span> <span class="mi">0</span> <span class="n">to</span> <span class="o">(</span><span class="n">dim</span><span class="o">-</span><span class="mi">1</span><span class="o">))</span> <span class="o">{</span>
          <span class="k">val</span> <span class="n">diff</span> <span class="k">=</span> <span class="n">p</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="o">-</span> <span class="n">dat</span><span class="o">(</span><span class="n">k</span><span class="o">)(</span><span class="n">i</span><span class="o">)</span>
          <span class="n">term</span> <span class="k">=</span> <span class="n">term</span> <span class="o">+</span> <span class="n">diff</span><span class="o">*</span><span class="n">diff</span>
        <span class="o">}</span>
        <span class="n">total</span> <span class="k">=</span> <span class="n">total</span> <span class="o">+</span> <span class="n">scala</span><span class="o">.</span><span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="o">(</span><span class="n">term</span><span class="o">)</span>
      <span class="o">}</span>
      <span class="n">total</span>
    <span class="o">}</span>
</pre>
</div>
<p>Scala is succinct and it is a great connivence to have a function definition capture data from its environment.   What we would like to do is generate an initial guess as the solution (we use the mean as our initial guess) and then call an optimizer (in this case a conjugate gradient optimizer) to do all the work:</p>
<div class="highlight">
<pre> <span class="k">val</span> <span class="n">p0</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">]</span> <span class="o">=</span> <span class="n">mean</span><span class="o">(</span><span class="n">dat</span><span class="o">)</span>
 <span class="k">val</span> <span class="o">(</span><span class="n">pF</span><span class="o">,</span><span class="n">fpF</span><span class="o">)</span> <span class="k">=</span> <span class="nc">CG</span><span class="o">.</span><span class="n">minimize</span><span class="o">(</span><span class="n">fx</span><span class="o">,</span><span class="n">p0</span><span class="o">)</span>
</pre>
</div>
<p>At this point we would be done, except the conjugate gradient method (which is superior to gradient descent and many the non-gradient methods) requires a gradient.<br />
We could provide a numeric estimate of the gradient by the following divided difference method:</p>
<div class="highlight">
<pre>  <span class="k">def</span> <span class="n">gradientD</span><span class="o">(</span><span class="n">f</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">]</span><span class="k">=&gt;</span><span class="kt">Double</span><span class="o">,</span><span class="n">p</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">])</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">]</span> <span class="o">=</span> <span class="o">{</span>
    <span class="k">val</span> <span class="n">xdim</span> <span class="k">=</span> <span class="n">p</span><span class="o">.</span><span class="n">length</span>
    <span class="k">val</span> <span class="n">p2</span> <span class="k">=</span> <span class="n">copy</span><span class="o">(</span><span class="n">p</span><span class="o">)</span>
    <span class="k">val</span> <span class="n">base</span> <span class="k">=</span> <span class="n">f</span><span class="o">(</span><span class="n">p2</span><span class="o">)</span>
    <span class="k">val</span> <span class="n">ret</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">](</span><span class="n">xdim</span><span class="o">)</span>
    <span class="k">val</span> <span class="n">delta</span> <span class="k">=</span> <span class="mf">1.0e-6</span>
    <span class="k">for</span><span class="o">(</span><span class="n">i</span> <span class="k">&lt;-</span> <span class="mi">0</span> <span class="n">to</span> <span class="o">(</span><span class="n">xdim</span><span class="o">-</span><span class="mi">1</span><span class="o">))</span> <span class="o">{</span>
      <span class="n">p2</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="k">=</span> <span class="n">p</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="o">+</span> <span class="n">delta</span>
      <span class="k">val</span> <span class="n">fplus</span> <span class="k">=</span> <span class="n">f</span><span class="o">(</span><span class="n">p2</span><span class="o">)</span>
      <span class="n">p2</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="k">=</span> <span class="n">p</span><span class="o">(</span><span class="n">i</span><span class="o">)</span>
      <span class="k">val</span> <span class="n">diff</span> <span class="k">=</span> <span class="o">(</span><span class="n">fplus</span><span class="o">-</span><span class="n">base</span><span class="o">)/</span><span class="n">delta</span>
      <span class="n">ret</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="k">=</span> <span class="n">diff</span>
    <span class="o">}</span>
    <span class="n">ret</span>
  <span class="o">}</span>
</pre>
</div>
<p>This numeric divided difference method often outperforms non-derivative optimization methods (like Powell&#8217;s Method and the Nelder-Mead Amoeba method).  But the technique can run into numeric difficulties.   We can remedy this if we are willing to write our function in a slightly more general way.   If we re-encode our function in a generic manner we can use <a href="http://en.wikipedia.org/wiki/Automatic_differentiation" target="ext">automatic differentiation</a>  (not to be confused with numeric differentiation or with symbolic differentiation) to produce a reliable gradient for optimization.  What we need to do is re-write our function to work over an abstract field of numbers instead of only the machine supplied doubles.  In fact what we need to do is specify a generic function that will work over any field, with the field to be determined later.  The code to do this in Scala is very similar to the non-generic code:</p>
<div class="highlight">
<pre>   <span class="k">val</span> <span class="n">genericFx</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">VectorFN</span> <span class="o">{</span>
      <span class="k">def</span> <span class="n">apply</span><span class="o">[</span><span class="kt">Y</span> <span class="k">&lt;:</span> <span class="kt">NumberBase</span><span class="o">[</span><span class="kt">Y</span><span class="o">]](</span><span class="n">p</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Y</span><span class="o">])</span><span class="k">:</span><span class="kt">Y</span> <span class="o">=</span> <span class="o">{</span>
        <span class="k">val</span> <span class="n">field</span> <span class="k">=</span> <span class="n">p</span><span class="o">(</span><span class="mi">0</span><span class="o">).</span><span class="n">field</span>
        <span class="k">val</span> <span class="n">dim</span> <span class="k">=</span> <span class="n">p</span><span class="o">.</span><span class="n">length</span>
        <span class="k">val</span> <span class="n">npoint</span> <span class="k">=</span> <span class="n">dat</span><span class="o">.</span><span class="n">length</span>
        <span class="k">var</span> <span class="n">total</span> <span class="k">=</span> <span class="n">field</span><span class="o">.</span><span class="n">zero</span>
        <span class="k">for</span><span class="o">(</span><span class="n">k</span> <span class="k">&lt;-</span> <span class="mi">0</span> <span class="n">to</span> <span class="o">(</span><span class="n">npoint</span><span class="o">-</span><span class="mi">1</span><span class="o">))</span> <span class="o">{</span>
          <span class="k">var</span> <span class="n">term</span> <span class="k">=</span> <span class="n">field</span><span class="o">.</span><span class="n">zero</span>
          <span class="k">for</span><span class="o">(</span><span class="n">i</span> <span class="k">&lt;-</span> <span class="mi">0</span> <span class="n">to</span> <span class="o">(</span><span class="n">dim</span><span class="o">-</span><span class="mi">1</span><span class="o">))</span> <span class="o">{</span>
            <span class="k">val</span> <span class="n">diff</span> <span class="k">=</span> <span class="n">p</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="o">-</span> <span class="n">field</span><span class="o">.</span><span class="n">inject</span><span class="o">(</span><span class="n">dat</span><span class="o">(</span><span class="n">k</span><span class="o">)(</span><span class="n">i</span><span class="o">))</span>
            <span class="n">term</span> <span class="k">=</span> <span class="n">term</span> <span class="o">+</span> <span class="n">diff</span><span class="o">*</span><span class="n">diff</span>
          <span class="o">}</span>
          <span class="n">total</span> <span class="k">=</span> <span class="n">total</span> <span class="o">+</span> <span class="n">smoothSQRT</span><span class="o">(</span><span class="n">term</span><span class="o">)</span>
        <span class="o">}</span>
        <span class="n">total</span>
      <span class="o">}</span>
    <span class="o">}</span>
</pre>
</div>
<p>Notice that code is very similar to the &#8220;def fx()&#8221; code.  The key differences are that we had to define genericFx as extending a trait (a type of Scala interface) called VectorFN and inside this trait extension we defined a parameterized function name apply().  apply() is a generic function that is willing to work over any type Y where Y is at least of type NumberBase[Y] (we will get more into what that means in a moment).  The difference in notation is that while the Scala function <em>syntax</em> can not specify a generic function with free type parameters (the incompletely specified Y) the Scala <em>semantics</em> are strong enough to implement this.  In fact standard function definitions (such as &#8220;def fx()&#8221;) are just syntactic sugar for extending the Scala built-in <a href="http://www.scala-lang.org/docu/files/api/scala/Function1.html" target="ext">Function1 trait</a>.  With a generic objective function in hand all we need is conjugate gradient code that is expecting a VectorFN (and willing to call apply() instead of just using naked function parenthesis) and some type NumberBase[Y] that can compute gradients for us.  The Scala compiler can specialize our genericFx() into one version for quick calculation and another for gradients.  How this is done is what we will discuss next.  From our point of view our problem is solved with the following one line of code:</p>
<div class="highlight">
<pre><span class="k">val</span> <span class="o">(</span><span class="n">pF</span><span class="o">,</span><span class="n">fpF</span><span class="o">)</span> <span class="k">=</span> <span class="nc">CG</span><span class="o">.</span><span class="n">minimize</span><span class="o">(</span><span class="n">genericFx</span><span class="o">,</span><span class="n">p0</span><span class="o">)</span>
</pre>
</div>
<p>This should always be your goal- build sufficient preparation so your last step is a &#8220;obvious one liner.&#8221;</p>
<h2>What Tools we Wish we Had Lying Around</h2>
<p>We supply in our example some workable conjugate gradient code, but that is standard so we will not discuss it.  What is of interest (and facilitated by Scala&#8217;s parametrized type system) is the implementation of <a href="http://en.wikipedia.org/wiki/Dual_number" target="ext">dual numbers</a> as a framework to supply automatic differentiation.  An implementation of dual numbers as a NumerBase[DualNumber] type is the core of our demonstration.</p>
<p>Dual numbers are an algebraic structure written as pairs of real numbers &#8220;(a,b)&#8221;.  The arithmetic table for dual numbers is given below:</p>
<table>
<tr>
<td>(a,b) + (c,d)</td>
<td>=</td>
<td>((a+c) , (b+d))</td>
</tr>
<tr>
<td>(a,b) &#8211; (c,d)</td>
<td>=</td>
<td>((a-c) , (b-d))</td>
</tr>
<tr>
<td>(a,b) * (c,d)</td>
<td>=</td>
<td>((a*c) , (a*d+b*c))</td>
</tr>
<tr>
<td>(a,b) / (c,d)</td>
<td>=</td>
<td>((a/c) , ((b*c-a*d)/(a*a)))</td>
</tr>
</table>
<p>In a dual number (a,b) &#8220;a&#8221; is the &#8220;large&#8221; or &#8220;standard&#8221; part of the number.  You can check from the arithmetic table that the pair of dual numbers (a,0) and (c,0) behave just as we would expect the real numbers a and c to behave.  In the dual number (a,b) &#8220;b&#8221; is the &#8220;small&#8221; or &#8220;ideal&#8221; portion of the number.  From the multiplication rule above  we can observe two rules: (0,b) * (c,0) = (0,b*c) (something small times anything else is small) and (0,b)*(0,d) = (0,0) (two small things become zero when multiplied).  Essentially the dual numbers are carrying around the first two terms of a Taylor series: we get as a result both the function value and the function derivative.  For a function f() over the real numbers we extend f() to work over the dual number by defining: f((a,b)) = (f(a),b f&#8217;(a)) (which is consistent with the previously defined arithmetic). We can check that the dual numbers numbers obey the usual laws of arithmetic (associative, commutative, distributive, identities and inverses).  The punchline is that over the dual numbers the divided difference estimate of f&#8217;(x) (the derivative of f() evaluated at x)  is in fact exact in the sense that f((x,1)) = (f(x),f&#8217;(x)) (or f((x,0)+(0,1)) &#8211; f((x,0)) = (0, f&#8217;(x))).  Implementing the DualNumber class is little more than transcribing the above arithmetic table into Scala.</p>
<p>We have already seen how to write code that uses NumberBase[Y] types (genericFx() itself is an example).  A more complicated example is the CG.minimize() code which not only accepts a generic function (in the form of VectorFN) but then specializes it to NumberBase[DualNumber] to compute gradients and also specializes to NumberBase[MDouble] for quick calculation during line searches (MDouble is just an adapter for machine Doubles, used for speed).  The ability to re-specialize a function is one of the advantages of a parameterized type system.  The DualNumbers are an example of forward automatic differentiation.  We could also use the same object framework to capture a representation of the computation path and apply more sophisticated methods such as reverse automatic differentiation. </p>
<p>We give a link to a jar containing <a href="http://www.win-vector.com/dfiles/ScalaDiff.jar">complete Scala source code</a> including this example, the DualNumber implementation, a conjugate gradient implementation and some JUnit tests (all under a GPLv3 license) and will go on to describe some of the design decisions.  The code is the bulky part of this work, so we will move on to discuss something more compact: types.</p>
<h2>Types</h2>
<p>If code is ever beautiful it is only when it is succinct.  Among the most succinct forms of code are individual type signatures and interfaces (though the indiscriminate repetition of type signatures is rightly considered ugly bloat, which Scala works to avoid).   Since we are distributing complete source we will describe only types and method signatures.  The entry points to the code are the JUnit tests (organized in the ScalaDiff/test source directory and depending on JUnit which was not included) and the demo program in ScalaDiff/src/demo/Demo.scala).</p>
<p>To be a usable arithmetic type (like DualNumber or MDouble) you must extend the following parameterized abstract class:</p>
<div class="highlight">
<pre><span class="k">abstract</span> <span class="k">class</span> <span class="nc">NumberBase</span><span class="o">[</span><span class="kt">NUMBERTYPE</span> <span class="k">&lt;:</span> <span class="kt">NumberBase</span><span class="o">[</span><span class="kt">NUMBERTYPE</span><span class="o">]]</span> <span class="o">{</span>
  <span class="c">// basic arithmetic</span>
  <span class="k">def</span> <span class="o">+</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="o">-</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="n">unary_-</span><span class="o">()</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="o">*</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="o">/</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>  <span class="kt">//</span> <span class="kt">that</span> <span class="kt">not</span> <span class="kt">equal</span> <span class="kt">to</span> <span class="kt">zero</span>
  <span class="c">// more complicated</span>
  <span class="k">def</span> <span class="n">pow</span><span class="o">(</span><span class="n">that</span><span class="k">:</span><span class="kt">Double</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="n">exp</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="n">log</span><span class="k">:</span><span class="kt">NUMBERTYPE</span> <span class="kt">//</span> <span class="kt">this</span> <span class="kt">is</span> <span class="kt">positive</span>
  <span class="c">// comparison functions</span>
  <span class="k">def</span> <span class="o">&gt;</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="k">def</span> <span class="o">&gt;=</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="k">def</span> <span class="o">==</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="k">def</span> <span class="o">!=</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="k">def</span> <span class="o">&lt;</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="k">def</span> <span class="o">&lt;=</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="c">// utility</span>
  <span class="k">def</span> <span class="n">field</span><span class="k">:</span><span class="kt">Field</span><span class="o">[</span><span class="kt">NUMBERTYPE</span><span class="o">]</span>
<span class="o">}</span>
</pre>
</div>
<p>In particular DualNumber extends NumberBase[DualNumber].  This deliberate circular reference has a big purpose: it allows publicly visible contravariant return types (returning nearly the exact type we really are instead of a base type).  This allows us to have strict type arguments so that trying to add a MDouble to DualNumber is a type error (even though they both extend the same base class).  The automatic differentiation technique encapsulated in the DualNumber class only works if all of the calculation is in the DualNumber types and this strict type enforcement allows the compiler to help prevent results sneaking in and out through other types.  All of the methods on NumberBase are obviously related to arithmetic except the field() method.  This method gives us access to a Field object which is responsible for carrying around the runtime type information (this is a common problem in Java and Scala, that some type information known at compile type such choice of template types is not easily accessed at runtime).  The Field class is as follows:</p>
<div class="highlight">
<pre><span class="k">abstract</span> <span class="k">class</span> <span class="nc">Field</span> <span class="o">[</span><span class="kt">NUMBERTYPE</span> <span class="k">&lt;:</span> <span class="kt">NumberBase</span><span class="o">[</span><span class="kt">NUMBERTYPE</span><span class="o">]]</span> <span class="o">{</span>
  <span class="k">def</span> <span class="n">zero</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>            <span class="kt">//</span> <span class="kt">return</span> <span class="kt">canonical</span> <span class="kt">zero</span> <span class="kt">in</span> <span class="kt">field</span>
  <span class="k">def</span> <span class="n">one</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>             <span class="kt">//</span> <span class="kt">return</span> <span class="kt">canonical</span> <span class="kt">one</span> <span class="kt">in</span> <span class="kt">field</span>
  <span class="k">def</span> <span class="n">inject</span><span class="o">(</span><span class="n">v</span><span class="k">:</span><span class="kt">Double</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>  <span class="kt">//</span> <span class="kt">return</span> <span class="kt">canonical</span> <span class="kt">representation</span> <span class="kt">of</span> <span class="kt">number</span> <span class="kt">in</span> <span class="kt">field</span>
  <span class="k">def</span> <span class="n">project</span><span class="o">(</span><span class="n">v</span><span class="k">:</span><span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Double</span> <span class="kt">//</span> <span class="kt">return</span> <span class="kt">standard-number</span> <span class="kt">represented</span> <span class="kt">in</span> <span class="kt">field</span>
  <span class="k">def</span> <span class="n">array</span><span class="o">(</span><span class="n">n</span><span class="k">:</span><span class="kt">Int</span><span class="o">)</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">NUMBERTYPE</span><span class="o">]</span> <span class="kt">//</span> <span class="kt">return</span> <span class="kt">an</span> <span class="kt">array</span> <span class="kt">of</span> <span class="kt">this</span> <span class="k">type</span>
</pre>
</div>
<p>The Field class is where we have factories for numbers (zero, one, arrays, injection from standard Doubles), casting (projection back to standard Doubles).</p>
<p>With these types defined we can actually read intent off some of the method signatures.  </p>
<p>For example our conjugate gradient optimizer is accessed through the following method signature:</p>
<div class="highlight">
<pre> <span class="k">def</span> <span class="n">minimize</span><span class="o">(</span><span class="n">fn</span><span class="k">:</span><span class="kt">VectorFN</span><span class="o">,</span><span class="n">x0</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">])</span><span class="k">:</span><span class="o">(</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">],</span><span class="kt">Double</span><span class="o">)</span> <span class="c">// return x,f(x)</span>
</pre>
</div>
<p>The above can be read as: CG.minimize() requires a VectorFN (our trait representing single argument functions with a free type parameter) and an initial point (in standard Doubles).  The code will the return a pair of the optimum point and the function evaluated at the optimum point.  From the type signature we can see that CG.minimize() expects to re-specialize the function &#8220;fn&#8221; to types of its own choosing (else it could have accepted a parameterized argument instead of our custom trait) and will handle all up-conversion and down-conversion between machine Doubles and NumberBase[Y]&#8216;s itself.  This sort of type information is hard to express (let alone enforce) in a dynamically typed language.</p>
<p>A slightly more complicated example is the lineMinD() method:</p>
<div class="highlight">
<pre><span class="k">def</span> <span class="n">lineMinD</span><span class="o">[</span><span class="kt">Y&lt;:NumberBase</span><span class="o">[</span><span class="kt">Y</span><span class="o">]](</span><span class="n">field</span><span class="k">:</span><span class="kt">Field</span><span class="o">[</span><span class="kt">Y</span><span class="o">],
 </span><span class="n">f</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Y</span><span class="o">]</span><span class="k">=&gt;</span><span class="kt">Y</span><span class="o">,
 </span><span class="n">xm</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">],
 </span><span class="n">di</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">])</span><span class="k">:</span><span class="o">(</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">],</span><span class="kt">Double</span><span class="o">)</span>
</pre>
</div>
<p>Notice it is willing to work with any type parameterized function (which means it is willing to let the caller pick the actual type of NumberBase[Y] and work with that).  Most callers will call with Y=MDouble (the wrapper for machine Doubles) and lineMin() will then work with that (without ever really knowing the actual underlying type).</p>
<p>A lot of fans of dynamic languages consider type systems to be mere hairshirt penance.   But that is not so.  Broken type systems (like Java&#8217;s collections before  erasure parameters were introduced in Java 1.5) are indeed more trouble than they are worth.  Working type systems (like C++ Templates/STL, Java 1.5+ and Scala) allow you to solve problems (and enforce decisions) during the design phase (which is much much cheaper than during the deployment phase).  You can&#8217;t set your types in stone (you are likely going to have them subtly wrong for the first few iteration).  You must be willing to think like a &#8220;language lawyer&#8221; to find out what parts of your work can be specified and enforced in the language type system.  To use an analogy: static types are your blueprint or your underpainting.</p>
<h2>Tests</h2>
<p>One argument against static types is that you can get much of their benefit from unit tests.  My opinion is you never have enough unit tests, so putting more pressure on your test suite is not wise.   Static types plus tests are strictly more powerful than static types alone or tests alone. </p>
<p>Even for this example toy-scale project we have include a JUnit test set to pursue a number of goals:</p>
<ul>
<li>Confirm our number implementations (DualNumber and MDouble) correctly model machine Doubles (perform parallel calculations and compare).</li>
<li>Confirm DualNumber obeys expected laws of algebra composition and cancellation <em>including the portions that can not be modeled in machine Doubles</em>.</li>
<li>Confirm DualNumbers compute gradients.</li>
<li>Confirm operations of optimizers and optimizer components.</li>
</ul>
<p>Many of these tests are related, but they don&#8217;t all imply each other and give different perspective on the errors they catch.  For example no amount of parallel computation between DualNumbers and machine Doubles is going to confirm the infinitesimal portion of the DualNumber is propagating correctly (since this is not a property of machine Doubles).  So we add extra tests that expect DualNumber to obey algebraic relations like: a*(b+c) = a*b + a*c hold.  It is then another step to confirm that whatever the DualNumbers calculate is not only self-consistent, but also models a truncated Taylor Series or differentiation.</p>
<h2>Conclusion</h2>
<p>We hope we have demonstrated how the complexity of a mathematical programming problem can be managed by breaking the problem into an objective function that is separate from the optimizer (allowing the optimizer to be both good and hidden) and a static type system (such as Scala) to help enforce required properties of a calculation (such as all numbers being routed though a required representation).  With these sort of tools available many formerly hard problems (that are often, unfortunately solved by over-specifying direct inefficient iterative improvement techniques) become &#8220;if I can write a reasonable objective function this may already by solved by an optimizer in my library.&#8221;  The more of these tools you have (either in your code or in your reference library) the more of these problems become easy (this is the topic of my earlier paper: <a href="http://www.win-vector.com/blog/2009/11/the-local-to-global-principle/">The Local to Global Principle</a>).</p>
<h2>Appendix: Fixing Smoothness</h2>
<p>Our chosen example objective function is very nice (i.e. convex) but it has a small (but correctable) problem.   The derivative or gradient or gradient has some jump discontinuities that could cause an optimizer to exit prematurely (not at the global optimum).  Consider the simple form of this for wiring a center to a single point at the origin (even in 1 dimension).  The wiring cost function is sqrt(x*x) has a cost graph as shown here.</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/abs.png" alt="abs.png" border="0" width="525" height="525" /><br />
</center></p>
<p>This is convex- but derivative is not smooth as we see in the included graph of the derivative of sqrt(x*x).</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/dabs.png" alt="dabs.png" border="0" width="525" height="525" /><br />
</center></p>
<p>So: in this case if the optimizer stops at one of the target points we can&#8217;t be sure that it stopped at the global optimum (it may have stopped due to the discontinuity in the gradient).  For some simple problems the optimum is necessarily at a target point.  For example on the number line take the target points 0,1 and x.  As long as x&ge;0 and x&le;1 the optimum placement will be x itself.</p>
<p>One way to defend against this is to use some sort of smoothed version of sqrt() that essentially decreases a little faster near the origin.  Our cost function becomes:<br />
<center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/cost2.png" alt="cost2.png" border="0" width="237" height="55" /><br />
</center><br />
where s() is our suitable approximation of the sqrt() function.  Two candidates are s(x) = (x+tau)^(1/2) and s(x) = x^(1/2 + tau); where tau is a small constant.  As long as tau is greater than zero we have no derivative discontinuity in s(x^2) and convexity is preserved (even made a bit stricter).  Other ways to deal with this include adding additional coordinates to the problem and small perturbations on these coordinates.  Finally, a point found by optimizing with respect to s(x) can be &#8220;polished&#8221; by re-starting the optimization at the first found solution and using sqrt(x) as the new objective (if the original point is not near any of the target points).</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/' rel='bookmark' title='Permanent Link: Gradients via Reverse Accumulation'>Gradients via Reverse Accumulation</a></li>
<li><a href='http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/' rel='bookmark' title='Permanent Link: R examine objects tutorial'>R examine objects tutorial</a></li>
<li><a href='http://www.win-vector.com/blog/2009/09/survive-r/' rel='bookmark' title='Permanent Link: Survive R'>Survive R</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Must Have Software</title>
		<link>http://www.win-vector.com/blog/2010/05/must-have-software/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=must-have-software</link>
		<comments>http://www.win-vector.com/blog/2010/05/must-have-software/#comments</comments>
		<pubDate>Fri, 28 May 2010 17:26:07 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[Opinion]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[GnuPG]]></category>
		<category><![CDATA[Keynote]]></category>
		<category><![CDATA[Latex]]></category>
		<category><![CDATA[Must Have Software]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[TrueCrypt]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1461</guid>
		<description><![CDATA[Having worked with Unix (BSD, HPUX, IRIX, Linux and OSX), Windows (NT4, 2000, XP, Vista and 7) for quite a while I have seen a lot of different software tools. I would like to quickly exhibit my &#8220;must have&#8221; list. These are the packages that I find to be the single &#8220;must have offerings&#8221; in [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/07/microsoft-store-again/' rel='bookmark' title='Permanent Link: Microsoft Store Again'>Microsoft Store Again</a></li>
<li><a href='http://www.win-vector.com/blog/2009/01/exciting-technique-1-the-r-language/' rel='bookmark' title='Permanent Link: Exciting Technique #1: The &#8220;R&#8221; language.'>Exciting Technique #1: The &#8220;R&#8221; language.</a></li>
<li><a href='http://www.win-vector.com/blog/2009/06/public-service-article-jstor-and-other-useful-research-archives/' rel='bookmark' title='Permanent Link: Public Service Article: JSTOR and other Useful Research Archives'>Public Service Article: JSTOR and other Useful Research Archives</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Having worked with Unix (BSD, HPUX, IRIX, Linux and OSX), Windows (NT4, 2000, XP, Vista and 7) for quite a while I have seen a lot of different software tools.  I would like to quickly exhibit my &#8220;must have&#8221; list.  These are the packages that I find to be the single &#8220;must have offerings&#8221; in a number of categories.  I have avoided some categories (such as editors, email programs, programing language, IDEs, photo editors, backup solutions, databases, database tools and web tools) where I have no feeling of having seen a single absolute best offering.</p>
<p>The spirit of the list is to pick items such that: if you disagree with an item in this list then either you are wrong or you know something I would really like to hear about.</p>
<p><span id="more-1461"></span></p>
<dl>
<dt><strong>Encryption, disk images: <a href="http://www.truecrypt.org/" target="ext">TrueCrypt</a> (open source: Linux, Windows, OSX)</strong></dt>
<dd>TrueCrypt can create portable encrypted virtual disks (files that can be mounted as a disk on any operating system).</dd>
<dd></dd>
<dt><strong>Encryption, files: <a href="http://www.gnupg.org/" target="ext">GnuPG</a> (open source: Linux, Windows, OSX)</strong></dt>
<dd>GnuPG is the tool to use to encrypt files for email.</dd>
<dd></dd>
<dt><strong>Presentation: <a href="http://www.apple.com/iwork/keynote/" target="ext">Apple Keynote</a> (commercial: OSX)</strong></dt>
<dd>Keynote is not quite as friendly as Microsoft PowerPoint, but it quickly produces beautiful presentations.</dd>
<dt><strong>Reference Library: <a href="http://mekentosj.com/papers/" target="ext">Papers</a> (commercial: OSX)</strong></dt>
<dd>&#8220;iTunes for PDF.&#8221;  Manage thousands of PDFs and references, annotate with meta-data, place papers into multiple project folders.  An interesting runner-up is <a href="http://bibdesk.sourceforge.net/" target="ext">BibDesk</a> (open source: OSX).</dd>
<dt><strong>Spreadsheet: <a href="http://office.microsoft.com/en-gb/excel/default.aspx" target="ext">Microsoft Excel</a> (commercial: Windows, OSX)</strong></dt>
<dd>Open Office and Google Docs are getting better every day, but neither come close to Microsoft Excel in functionality and versatility of user interface.  If you are on a platform that supports Excel, working regularly with spreadsheets and using something other than Excel: it really means that you do not value your time.</dd>
<dt><strong>Statistics Software: <a href="http://www.r-project.org/" target="ext">R</a> (open source: Linux, Windows, OSX)</strong></dt>
<dd>R is rapidly becoming the platform of choice for statisticians and is (with the addition of lattice and ggplot2) the best way to produce graphs.  R has fairly nasty programming language, but has so many statistical operations available that it can not be avoided.</dd>
<dt><strong>Technical Documentation: <a href="http://www.tug.org/" target="ext">LaTeX</a> (open source: Linux, Windows, OSX)</strong></dt>
<dd>It may seem antiquated but TeX/LaTex is still far more powerful than the &#8220;WSYWYG&#8221; pretenders.  The separation of presentation from specification, automatic management of references, table of contents and being able<br />
to include PDFs from external files (which get refreshed when you re-build the document) are all lifesavers.</dd>
<dt><strong>Version Control: <a href="http://git-scm.com/" target="ext">git</a> (open source: Linux, Windows, OSX)</strong></dt>
<dd>Just about the only version control system that: doesn&#8217;t damage the data you are trying to manage by adding dot-files into all of the directories, can routinely handle large files and can work productively without a network connection.  <a href="http://www.perforce.com/" target="ext">Perforce</a> is powerful central server commercial option (with the ability to have central policies, control and review).
</dd>
</dl>
<p></p>
<p>I look forward to learning which of my choices are considered poor and what your must-haves are.</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/07/microsoft-store-again/' rel='bookmark' title='Permanent Link: Microsoft Store Again'>Microsoft Store Again</a></li>
<li><a href='http://www.win-vector.com/blog/2009/01/exciting-technique-1-the-r-language/' rel='bookmark' title='Permanent Link: Exciting Technique #1: The &#8220;R&#8221; language.'>Exciting Technique #1: The &#8220;R&#8221; language.</a></li>
<li><a href='http://www.win-vector.com/blog/2009/06/public-service-article-jstor-and-other-useful-research-archives/' rel='bookmark' title='Permanent Link: Public Service Article: JSTOR and other Useful Research Archives'>Public Service Article: JSTOR and other Useful Research Archives</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/05/must-have-software/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Algorithmic Movie (with texture)</title>
		<link>http://www.win-vector.com/blog/2010/04/algorithmic-movie-with-texture/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=algorithmic-movie-with-texture</link>
		<comments>http://www.win-vector.com/blog/2010/04/algorithmic-movie-with-texture/#comments</comments>
		<pubDate>Tue, 27 Apr 2010 16:44:52 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Algorithmic Art]]></category>
		<category><![CDATA[genetic art]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1457</guid>
		<description><![CDATA[We would like to share a new algorithmic movie we have created. Since the mid 90&#8242;s we have been dabbling off and on with a combination of algorithmic and genetic art (see: What is “Genetic Art?” or try running the Java code directly in your browser). Every once in a while we return to the [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/06/what-is-genetic-art/' rel='bookmark' title='Permanent Link: What is &#8220;Genetic Art?&#8221;'>What is &#8220;Genetic Art?&#8221;</a></li>
<li><a href='http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/' rel='bookmark' title='Permanent Link: Gradients via Reverse Accumulation'>Gradients via Reverse Accumulation</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>We would like to share a new algorithmic movie we have created.</p>
<p>Since the mid 90&#8242;s we have been dabbling off and on with a combination of algorithmic and genetic art (see: <a href="http://www.win-vector.com/blog/2009/06/what-is-genetic-art/" target="other">What is “Genetic Art?”</a> or try <a href="http://www.mzlabs.com/MZLabsJM/page4/page22/page22.html" target="other">running the Java code directly in your browser</a>).  Every once in a while we return to the project and generate something we would like to share.</p>
<p><span id="more-1457"></span><br />
For this project we have used formulas over the variables &#8220;x&#8221; and &#8220;y&#8221; to describe how color varies as a function of position on our canvas.</p>
<p>This has allowed formulas like:</p>
<blockquote><p>
( + ( mod ( iexp k ) ( isin ( / j ( / ( x + i y + j x + k y ) k ) ) ) ) ( mod ( iexp k ) ( isin ( / j ( / ( x + i y + j x + k y ) k ) ) ) ) )
</p></blockquote>
<p>To generate pictures like this:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/gartPicture2010_04_27_09.20.21.7941.jpg" alt="gartPicture2010_04_27_09.20.21.794.jpg" border="0" width="500" height="333" /><br />
</center></p>
<p>We then add a source-texture from C. Estrade&#8217;s &#8220;Full-Color Japanese Textile Designs CD-ROM and Book&#8221; (<a href="http://store.doverpublications.com/0486996956.html" target="ext">Dover</a>, unrestricted use):<br />
<center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/023.jpg" alt="023.jpg" border="0" width="500" height="325" /><br />
</center></p>
<p>Which (with a slightly modified formula) yields a picture like this:</p>
<blockquote><p>
( + ( subst ( mod ( iexp k ) ( isin ( / j ( / ( x + i y + j x + k y ) k ) ) ) ) Img23 ) ( mod ( iexp k ) ( isin ( / j ( / ( x + i y + j x + k y ) k ) ) ) ) )
</p></blockquote>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/gartPicture2010_04_18_09.12.24.2121.jpg" alt="gartPicture2010_04_18_09.12.24.212.jpg" border="0" width="500" height="333" /><br />
</center></p>
<p>We can further modify the formula to depend on time (represented by the new variable &#8220;z&#8221;):</p>
<blockquote><p>
( + ( subst ( mod ( iexp k ) ( isin ( / j ( / ( x + i y + j (x +z) + k (y + z) ) k ) ) ) ) Img23 ) ( mod ( iexp k ) ( isin ( / j ( / ( x + i y + j (x +z) + k (y + z) ) k ) ) ) ) )
</p></blockquote>
<p>And get a <a href="http://www.youtube.com/watch?v=hs_glOeEV7c" target="ext">movie</a> like this:</p>
<p><center><br />
<object width="500" height="405"><param name="movie" value="http://www.youtube.com/v/hs_glOeEV7c&#038;hl=en_US&#038;fs=1&#038;rel=0&#038;border=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/hs_glOeEV7c&#038;hl=en_US&#038;fs=1&#038;rel=0&#038;border=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="500" height="405"></embed></object><br />
</center></p>
<p>What we have previously called &#8220;genetic art&#8221; was the system of automatically combining and re-combining fragments of formulas using user votes and preferences (so nobody would have to see or understand these ugly formulas to produce art).  What we now present is a larger &#8220;algebra&#8221; of &#8220;simple picture plus pattern = complicated pictures&#8221; and &#8220;picture plus time transformations = movie.&#8221;</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/06/what-is-genetic-art/' rel='bookmark' title='Permanent Link: What is &#8220;Genetic Art?&#8221;'>What is &#8220;Genetic Art?&#8221;</a></li>
<li><a href='http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/' rel='bookmark' title='Permanent Link: Gradients via Reverse Accumulation'>Gradients via Reverse Accumulation</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/04/algorithmic-movie-with-texture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SIGACT Review of: Combinatorics the Rota Way</title>
		<link>http://www.win-vector.com/blog/2010/04/sigact-review-of-combinatorics-the-rota-way/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=sigact-review-of-combinatorics-the-rota-way</link>
		<comments>http://www.win-vector.com/blog/2010/04/sigact-review-of-combinatorics-the-rota-way/#comments</comments>
		<pubDate>Wed, 21 Apr 2010 03:51:56 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Opinion]]></category>
		<category><![CDATA[Book Reviews]]></category>
		<category><![CDATA[Combinatorics]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1450</guid>
		<description><![CDATA[SIGACT News review of: Combinatorics the Rota Way. Also found on Professor Gasarch&#8217;s page and ACM SIGACT News Volume 41, Issue 2 (paywall) Review of Combinatorics The Rota Way by Joseph P.S. Kung, Gian-Carlo Rota and Catherine H. Yan Cambridge, 2009 396 pages, Trade Paperback Review by John Mount, jmount@win-vector.com April 20, 2010 Introduction Combinatorics, [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2008/08/what-is-mathematics-really/' rel='bookmark' title='Permanent Link: What is Mathematics, Really?'>What is Mathematics, Really?</a></li>
<li><a href='http://www.win-vector.com/blog/2009/05/the-joy-of-calculation/' rel='bookmark' title='Permanent Link: The Joy of Calculation'>The Joy of Calculation</a></li>
<li><a href='http://www.win-vector.com/blog/2008/04/sorting-in-anger/' rel='bookmark' title='Permanent Link: Sorting Used in Anger'>Sorting Used in Anger</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>SIGACT News review of: Combinatorics the Rota Way.  Also found on <a href="http://www.cs.umd.edu/~gasarch/bookrev/41-2.pdf" target="ext">Professor Gasarch&#8217;s page</a> and  <a href="http://portal.acm.org/browse_dl.cfm?idx=J697" target="ext">ACM SIGACT News Volume 41, Issue 2 (paywall)</a></p>
<p><span id="more-1450"></span></p>
<div align="center"><b>Review of<br />
Combinatorics The Rota Way<br />
by Joseph P.S. Kung, Gian-Carlo Rota and Catherine H. Yan<br />
Cambridge, 2009<br />
396 pages, Trade Paperback</b></div>
<div align="center"><b>Review by<br />
John Mount, jmount@win-vector.com<br />
April 20, 2010</b></div>
<h1><a name="SECTION00010000000000000000">Introduction</a></h1>
<p>Combinatorics, as it matures, becomes harder to succinctly describe. The field has progressed from the basic study of finite sets and counting techniques to being the discipline where questions involving counting, graphs, connectivity, mappings and partial orders all naturally reside. But the objects that combinatorics studies turn out not to be the correct foundation to support modern combinatorial methods. Many combinatorial methods were dismissed as mere technique until combinatorics expanded to include the natural domains of these methods: lattices, formal power series, valuation rings, matroids and many diverse algebras. One person who pushed hard for this coherence and unity was Gian-Carlo Rota.</p>
<p>An example of a high-school level combinatorial trick is proving the equation</p>
<div align="center"><img width="101" height="60" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg1.png" alt="$\displaystyle \sum_{i=0}^{n} \binom{n}{i} = 2^n $"></div>
<p>by applying the binomial theorem to <img width="61" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg2.png" alt="$ (1+1)^n$"> . This trick is transformed into a method when you recognize that you really should be working in the ring of formal power series and invent the Umbral Calculus. With the Umbral Calculus you can use the equivalence of the following two equations:</p>
<div align="center">
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="20" height="30" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg3.png" alt="$\displaystyle b^n$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg4.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap><img width="155" height="60" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg5.png" alt="$\displaystyle (a+1)^n = \sum_{i=0}^{n} \binom{n}{i} a^i$"></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="21" height="30" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg6.png" alt="$\displaystyle a^n$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg4.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap><img width="205" height="60" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg7.png" alt="$\displaystyle (b-1)^n = \sum_{i=0}^{n} (-1)^{n-i} \binom{n}{i} b^i$"></td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"><br />
(i.e. <img width="68" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg8.png" alt="$ b = a+1$"> is equivalent to <img width="68" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg9.png" alt="$ a=b-1$"> ) to prove that for any two arbitrary infinite sequences <img width="37" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg10.png" alt="$ a_i,b_i$"> the following two statements are also equivalent:</p>
<p></p>
<div align="center"><a name="eq1"></a><a name="eq2"></a><br />
<table cellpadding="0" align="center" width="100%">
<tr valign="middle">
<td nowrap align="right"><img width="20" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg11.png" alt="$\displaystyle b_n$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg4.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap><img width="81" height="60" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg12.png" alt="$\displaystyle \sum_{i=0}^{n} \binom{n}{i} a_i \;$">for all<img width="18" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg13.png" alt="$\displaystyle \; n$"></td>
<td width="10" align="right">(1)</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="21" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg14.png" alt="$\displaystyle a_n$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg4.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap><img width="133" height="60" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg15.png" alt="$\displaystyle \sum_{i=0}^{n} (-1)^{n-i} \binom{n}{i} b_i \;$">for all<img width="22" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg16.png" alt="$\displaystyle \; n.$"></td>
<td width="10" align="right">(2)</td>
</tr>
</table>
</div>
<p><br clear="all"><br />
For example: we could pick <img width="45" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg17.png" alt="$ a_i = i$"> and substitute it into Equation&nbsp;<a href="#eq1">1</a>. With some work we see this implies <img width="73" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg18.png" alt="$ b_i= 2^{i-1} i$"> .<a name="tex2html1" href="#foot43"><sup>1</sup></a>Then by the Umbral result we know Equation&nbsp;<a href="#eq2">2</a> must also be true so we get a new identity: <img width="186" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg22.png" alt="$ n = \sum_{i=0}^{n} (-1)^{n-i} \binom{n}{i} 2^{i-1} i$"> . This algebraic production of a new identity is very different than the classical method of &#8220;counting two ways&#8221; (or being lucky enough to come up with a clever bijection to prove the identity).</p>
<h1><a name="SECTION00020000000000000000">Summary</a></h1>
<p>The book &#8220;Combinatorics the Rota Way&#8221; is itself hard to succinctly describe. The first and third authors tell of writing this book using notes from the Massachusetts Institute of Technology&#8217;s course 18.315 collected over a span of more than 30 years. Gian-Carlo Rota himself was added as a posthumous author. The book itself contains more than a single course-year&#8217;s worth of material and is packed very densely.</p>
<p>The book&#8217;s emphasis is abstract and algebraic. The exercises are not to teach, but are instead to identify applications of combinatorics in other mathematical disciplines. The book is the product of a strong push to demonstrate many combinatorial methods in their most powerful, but not most obvious, forms. This work is clearly a labor of love and contains some remarkable material. However, due to the large breadth of the work not much time is spent on motivation or on concrete examples.</p>
<h2><a name="SECTION00021000000000000000">Chapter 1: Sets, Functions and Relations</a></h2>
<p>The first chapter covers the definitional foundations of combinatorics: sets, lattices, partial orders, functions and relations. These are the discrete objects that the book will reason about by later building more complicated algebraic objects. This section is very dense and reads like a compressed Bourbaki treatment of discrete mathematics.</p>
<p>One portion of this chapter that is problematic is the section on entropy that seems to serve no purpose other than to prepare the reader for exercise 1.4.10 which demonstrates an abstraction of entropy. Also, exercises 1.2.5(j,k) are needlessly cruel in asking the reader to recreate the Robertson-Seymour graph minor theorem. There have been books where the reader is successfully guided through a major result by exercises, such as the Weak Perfect Graph Theorem in Lov&aacute;sz&#8217;s &#8220;Combinatorial Problems and Exercises&#8221;, but this book is not structured in that manner.</p>
<h2><a name="SECTION00022000000000000000">Chapter 2: Matching Theory</a></h2>
<p>The second chapter is a welcome change in tone and opens with a quote from Harper and Rota describing matching theory and a clever 1979 Putnam exam problem is worked into the exercises and solutions. Central to the chapter is &#8220;marriage theorem&#8221;, which determines when matchings are possible. Also discussed is Birkhoff&#8217;s Theorem, which states that every doubly stochastic matrix is a convex combination of permutations matrices, which relates matchings to matrices. The text is lively and includes a number of well-researched asides, such as the origin of the name &#8220;The Hungarian Method.&#8221; However, there are some problems with forward reference: for example the reader is asked to work a couple of exercise (2.4.5 and 2.4.6) using the Binet-Cauchy formula, which isn&#8217;t discussed at length until chapter 6.</p>
<h2><a name="SECTION00023000000000000000">Chapter 3: Partially Ordered Sets and Lattices</a></h2>
<p>This chapter begins with a very exciting presentation of the M&ouml;bius Function (the convolutional inverse of what is essentially the indicator function of a partial order). It is a real pleasure to see this material well presented in a general lattice setting, instead of the more common and specialized number theoretic setting. The chapter moves on to chains (ordered sequences in lattices) and anti-chains (sets of incomparable elements) in partial orders. The authors present Dilworth&#8217;s theorem which states that every partial can be covered by a number of chains no larger than the size of the largest anti-chain.<a name="tex2html2" href="#foot57"><sup>2</sup></a> The chapter continues with Sperner Theory, which relates counting anti-chains to binomial coefficients. Chapter 3 concludes with valuation rings and M&ouml;bius Algebras: a transition to the more algebraic style found in Chapter 4.</p>
<h2><a name="SECTION00024000000000000000">Chapter 4: Generating Functions and the Umbral Calculus</a></h2>
<p>This is a key chapter. The book introduces the Umbral Calculus, a transform space automating the manipulation of generating functions. The algebra of delta operators is introduced, which provides an abstraction of differentiation. Finally co-algebras are explored, which abstract the processes of factoring.</p>
<p>A rare (and unfortunate) typo on page-190 mis-defines a basic sequence <img width="42" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg23.png" alt="$ p_n(x)$"> for the delta operator <img width="17" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg24.png" alt="$ Q$"> as obeying <img width="131" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg25.png" alt="$ Q p_n(x) = p_{n-1}(x)$"> instead of the correct equation: <img width="140" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg26.png" alt="$ Q p_n(x) = n p_{n-1}(x)$"> . A careful reader can spot the mistake as it is inconsistent with the the subsequent demonstrations and uses.</p>
<h2><a name="SECTION00025000000000000000">Chapter 5: Symmetric Functions and Baxter Algebras</a></h2>
<p>This chapter treats a number of important algebraic topics. Symmetric functions are studied and identified as being the obvious class of functions that contains all of the well know generating functions already studied. P&oacute;lya&#8217;s Enumeration Theory, which is the method of counting the number of equivalence classes of distinct arrangements, is given a very interesting exposition. But the book skips the classic examples and exercises, such as counting the number of ways to construct distinct necklaces from colored beads, that would be needed for the topic to be fully approachable. Baxter Algebras, which abstract both summation and integration by parts, are introduced and via a study the sequence shift operator. By this point the book has abstract versions of both differentiation and integration, providing a combinatorial groundwork to prove theorems on &#8220;the calculus&#8221; that are more general than is possible in any one theory of differentiation or integration.</p>
<h2><a name="SECTION00026000000000000000">Chapter 6: Determinants, Matrices and Polynomials</a></h2>
<p>This chapter is most similar to classical polynomial invariant theory, the study of symmetric functions of the roots of polynomials such as the discriminant. A major theme of this chapter is the study of the relations between properties of polynomial coefficients and the locations of roots of the polynomials. The study of matrices brings us to the remarkable Binet-Cauchy Formula for the determinant of a product of matrices. The results are deep, but it is a shame that more time isn&#8217;t spent on simple concrete applications such as using the Binet-Cauchy formula to count the number of spanning trees in a graph. This chapter reveals the parts of combinatorics that come from analysis and the study of locations of roots of polynomials (via group theory), in contrast to the parts that come from enumerating finite sets, linear algebra and abstract algebra. This is also the chapter where the exterior algebra, a favorite tool of Rota&#8217;s, is most discussed.</p>
<p>A typo on page 275 (a potentially confusing comma in the definition of the <img width="46" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg27.png" alt="$ eval()$"> operation) can be recovered from because the authors have the nice habit of explicitly calling out the domain and range of functions.</p>
<h1><a name="SECTION00030000000000000000">Opinion</a></h1>
<p>Some important questions about this book are: is Gian-Carlo Rota a coauthor, what is the purpose of the book and who is the best audience?</p>
<p>Gian-Carlo Rota seems appropriately labeled as a co-author, as clearly a lot of his work went into the book. The book is not suitable to be used as an introductory text book or as a reference. It is a book meant to be read. The ideal audience is capable of graduate level mathematics, is comfortable with a high degree of abstraction and algebra and is already familiar with many of the structures and techniques of combinatorics: sets, graphs, matrices, alternating sequences and generating functions. A mathematician or computer scientist wanting to learn more about the science of combinatorics will find a good read here.</p>
<p>The book works best as a second read of the topics covered. If you already know of a combinatorial method, like P&oacute;lya&#8217;s Enumeration Theory, this book is a good place to find the starting point for an alternate and powerful treatment of the topic. The book admits to not being self contained, and has a few forward-reference problems. However, this is forgivable when you realize the goal of this book is not to teach some easy discrete mathematics before you move on to analysis, but to extract the important combinatorial methods and themes from all of mathematics.</p>
<p>The content is well written, very accurate and well edited. The index is good, but not quite up to the job. The bibliography is very good and divided into three useful sections: papers by Gian-Carlo Rota and coworkers, books for further reading and a section of references.</p>
<p>We close with a extract from the book at hand. Many mathematicians have used the phrase &#8220;merely combinatorial proof&#8221; as a phrase of dismissal. However, when properly founded, combinatorial proofs are in fact more general than proofs that depend on additional specific details from the original problem domain. The authors take some justifiable pleasure in including points like: &#8220;Hilbert&#8217;s basis theorem is equivalent to the &#8216;trivial combinatorial fact&#8217; given in Gordan&#8217;s lemma.&#8221; This is certainly a taste of combinatorics the Rota way.</p>
<p></p>
<hr />
<h4>Footnotes</h4>
<dl>
<dt><a name="foot43">&#8230;.</a><a href="#tex2html1"><sup>1</sup></a></dt>
<dd>For this use the binomial theorem to expand <img width="62" height="31" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg19.png" alt="$ (1+x)^n$"> , differentiate with respect to <img width="13" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg20.png" alt="$ x$"> and then substitute in <img width="42" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/CTRimg21.png" alt="$ x=1$"> .</dd>
<dt><a name="foot57">&#8230; anti-chain.</a><a href="#tex2html2"><sup>2</sup></a></dt>
<dd>From this they derive just about the only Ramsey-theoretic style result in the book: any large partial order must have a large chain or large anti-chain.</dd>
</dl>
<p></p>
<hr />


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2008/08/what-is-mathematics-really/' rel='bookmark' title='Permanent Link: What is Mathematics, Really?'>What is Mathematics, Really?</a></li>
<li><a href='http://www.win-vector.com/blog/2009/05/the-joy-of-calculation/' rel='bookmark' title='Permanent Link: The Joy of Calculation'>The Joy of Calculation</a></li>
<li><a href='http://www.win-vector.com/blog/2008/04/sorting-in-anger/' rel='bookmark' title='Permanent Link: Sorting Used in Anger'>Sorting Used in Anger</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/04/sigact-review-of-combinatorics-the-rota-way/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deming, Wald and Boyd: cutting through the fog of analytics</title>
		<link>http://www.win-vector.com/blog/2010/04/deming-wald-and-boyd-cutting-through-the-fog-of-analytics/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=deming-wald-and-boyd-cutting-through-the-fog-of-analytics</link>
		<comments>http://www.win-vector.com/blog/2010/04/deming-wald-and-boyd-cutting-through-the-fog-of-analytics/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 22:53:03 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[History]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[A-10]]></category>
		<category><![CDATA[Boyd]]></category>
		<category><![CDATA[Deming]]></category>
		<category><![CDATA[Novum Organum]]></category>
		<category><![CDATA[OODA]]></category>
		<category><![CDATA[PDCA]]></category>
		<category><![CDATA[Wald]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1421</guid>
		<description><![CDATA[This article is a quick appreciation of some of the statistical, analytic and philosphic techniques of Deming, Wald and Boyd. Many of these techniques have become pillars of modern industry through the sciences of statistics and operations research. We start with W. Edwards Deming. Deming was a statistician who designed many of the production methods [...]


No related posts.]]></description>
			<content:encoded><![CDATA[<p>This article is a quick appreciation of some of the statistical, analytic and philosphic techniques of Deming, Wald and Boyd.  Many of these techniques have become pillars of modern industry through the sciences of statistics and operations research.<br />
<span id="more-1421"></span></p>
<p>We start with <a href="http://en.wikipedia.org/wiki/W._Edwards_Deming" target="wp">W. Edwards Deming</a>.  Deming was a statistician who designed many of the production methods of post-war occupied Japan.  Deming&#8217;s work on quality quantification, measurement and continuous improvement formed the fundamental basis of Japan&#8217;s later rise as a respected manufacturing super power.  Many of the further improved techniques were later imported into the United States as &#8220;eastern wisdom.&#8221;  However, some of the lesser ideas were perverted by eager followers into destructive cargo-cult rituals like &#8220;six sigma&#8221;  (we must remember that it was the depth and power of Deming&#8217;s ideas that attracted the imitators).</p>
<p>One of Deming&#8217;s most fundemental ideas was the &#8220;PDCA loop.&#8221;</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/PDCA.png" alt="PDCA.png" border="0" width="300" /><br />
</center></p>
<p>The PDCA loop is cycle of conceptual and analytic effort that sequences repeatedly through the stages Plan, Do, Check and Act.  The cycle starts with a plan and the next cycle&#8217;s plan is influenced by results of the previous cycle.  The explicit Check and Act steps show the presumption that the Do step will always need measurement and correction.  This cycle is designed to help mitigate Clausewitz&#8217;s observation that  &#8220;no campaign plan survives first contact with the enemy.&#8221;  Deming&#8217;s idea is essentially the systematic application of the scientific method (&#8220;propose/test&#8221;- or Francis Bacon&#8217;s Novum Organum of 1620) to adaption and implementation of plans.</p>
<p>While Deming was teaching planning and &#8220;statistical process control&#8221; to boost US wartime production a number of other statisticians were having great success in developing reactive strategies.  One of the best stories is that of Abraham Wald.  Wald became interested in allied aircraft mortality during World War II.  He prepared a number of studies and charts of surviving aircraft, tabulating where bullet and shrapnel damage was most extensive.  He could, for example, combine inspections of many returning bombers to determine where the returning bombers had the most damage (say the bulk area of fuselage and the leading edges of the wings):</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/b25b.png" alt="b25b.png" border="0" width="300" height="325" /><br />
</center></p>
<p>Wald then had the genius idea of proposing additional armor on the parts of the aircraft that never showed any hits on <em>surviving</em> aircraft (reasoning that aircraft routinely took damage everywhere so the undamaged areas in surviving aircraft must be the areas more often damaged in the unobserved, non-returning lost aircraft).  From the above diagram we might propose to add more armor near the pilots, engines and trailing control surfaces.  Wald later published sophisticated statistical techniques for imputing the distribution of hits (and therefore the distribution of vulnerabilities) on the unobserved aircraft:   &#8220;A Method of Estimating Plane vulnerability Based on Damage of Survivors,&#8221; Abraham Wald, Center for Naval Analyses (1943).</p>
<p>This  art of reactive observation was later systematized by <a href="http://en.wikipedia.org/wiki/John_Boyd_(military_strategist)" target="wp">Colonel John Boyd</a>.   Boyd invented what he called the &#8220;OODA loop.&#8221;</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/OODA.png" alt="OODA.png" border="0" width="300" /><br />
</center></p>
<p>This loop cycles similarly to Demings&#8217;s through a pattern of Observe, Orient, Decide and Act.  The OODA loop differs from the PDCA loop in that it assumes a world that looks back and adapts against your actions.   Boyd added ideas of tempo and pace such as &#8220;short cutting the loop&#8221; (skipping from act to orient or even act to decide) to adapt faster than nature or than your enemy.</p>
<p>Boyd is also famous for applying his and Wald&#8217;s ideas in the design of the A-10 Warthog.  The  A-10 is a unique non-stealth, sub-sonic close air support plane.  It is considered one of the ugliest things to every fly.  The A-10 was not state of the art when it was introduced but it was scientifically designed for survival in the style of Wald.  The engine intakes are partially protected by the wings, there is extra titanium armor around the pilot and a primitive direct lever control system in addition to the traditional hydraulics.  The A-10 is known for its &#8220;lingering ability&#8221; or ability to stay near troops under fire to deliver support.  It has also allowed pilots like <a href="http://en.wikipedia.org/wiki/Kim_Campbell_(pilot)"  target="wp">Major Kim Reed-Campbell</a> to fly for an hour and return to base after losing pieces of wing and all hydraulics.  Here is a picture Reed-Campbell inspecting her damaged A-10 in 2003 after safely landing:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/04/Reed-Campbell.jpg" alt="Reed-Campbell.jpg" border="0" width="500" height="368" /><br />
</center></p>
<p>Deming, Wald and Boyd were able to move statistics and analytics beyond description and use mathematics for prescription.  The techniques they developed for planning, measurement and reasoning remain relevant to this day.</p>


<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/04/deming-wald-and-boyd-cutting-through-the-fog-of-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R annoyances</title>
		<link>http://www.win-vector.com/blog/2010/03/r-annoyances/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=r-annoyances</link>
		<comments>http://www.win-vector.com/blog/2010/03/r-annoyances/#comments</comments>
		<pubDate>Sat, 20 Mar 2010 18:49:42 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Rants]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Principle of Least Astonishment]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[R is not your friend]]></category>
		<category><![CDATA[R programming annoyances]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1407</guid>
		<description><![CDATA[Readers returning to our blog will know that Win-Vector LLC is fairly &#8220;pro-R.&#8221; You can take that to mean &#8220;in favor or R&#8221; or &#8220;professionally using R&#8221; (both statements are true). Some days we really don&#8217;t feel that way. Consider the following snippet of R code where we create a list with a single element [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/' rel='bookmark' title='Permanent Link: R examine objects tutorial'>R examine objects tutorial</a></li>
<li><a href='http://www.win-vector.com/blog/2010/01/relative-returns-a-banker-versus-trader-paradox/' rel='bookmark' title='Permanent Link: Relative returns: a banker versus trader paradox'>Relative returns: a banker versus trader paradox</a></li>
<li><a href='http://www.win-vector.com/blog/2008/04/sorting-in-anger/' rel='bookmark' title='Permanent Link: Sorting Used in Anger'>Sorting Used in Anger</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Readers returning to our blog will know that Win-Vector LLC is fairly &#8220;pro-<a href="http://www.win-vector.com/blog/tag/r/" target="x">R</a>.&#8221;  You can take that to mean &#8220;in favor or R&#8221; or &#8220;professionally using R&#8221; (both statements are true).  Some days we really don&#8217;t feel that way.  <span id="more-1407"></span><br />
Consider the following snippet of R code where we create a list with a single element named &#8220;x&#8221; that refers to a numeric vector.  We start with a demonstration of the hard-coded method of pulling the x-value back out using the &#8220;$&#8221; operator.</p>
<pre>
&gt; l &lt;- list(x=c(1,2,3))
&gt; l$x
[1] 1 2 3
</pre>
<p>But suppose we wanted to automate this; that is pass in the name of the value we want in a variable.  We are after all using a computer, so automating a step seems like a reasonable desire.  R supplies a notation for this using the &#8220;[]&#8221; operator.  But something slightly different comes out under the &#8220;[]&#8221; operator than under the &#8220;$&#8221; operator:</p>
<pre>
&gt; varName <- 'x'
&gt; l[varName]
$x
[1] 1 2 3
</pre>
<p>Notice that the printed outputs are slightly different (one echoes "$x" and one does not).  Let's use the "class()" method to see what is actually being returned in each case.</p>
<pre>
&gt; class(l$x)
[1] "numeric"
&gt; class(l['x'])
[1] "list"
</pre>
<p>Completely different return types are returned (in one case a numeric vector in the other a general list, not interchangeable types). </p>
<p>At this point you may think it is time to turn in our "pro" label and call ourselves "newb" (Internet slang for "newbie" or "idiot").  But let's slow down for a bit.   When two views of the same situation disagree (such as the difference in opinion between the authors of R and myself whether the "[]" and "$" operators should return the same type) you at most know that at least one of those views is wrong.  You don't really know if one view is right or even if one view is right which one it is.  I can, however, bring in some additional argument to try and show the design of R is in fact wrong.  The additional argument is <a href="http://en.wikipedia.org/wiki/Principle_of_least_astonishment" target="o">"The Principle of Least Astonishment."</a>  This principle roughly says that it is a mistake to introduce unnecessary differences in outcomes (which to the unprepared user are unpleasant surprises).  There may be some deep (yet obscure) reasons the two operators prefer to return different results.  But the fact you would have to find a way to document and explain these differences really should make one think that this situation is really a mis-design and the "explanation" is really an attempt at a work around.  Or to put it more rudely: there may be an explanation, but there is no excuse.</p>
<p>For another example consider creating a 3 by 3 matrix:</p>
<pre>
&gt; m &lt;- matrix(c(1,2,3,1,1,1,0,0,1),nrow=3,ncol=3)
&gt; m
     [,1] [,2] [,3]
[1,]    1    1    0
[2,]    2    1    0
[3,]    3    1    1
</pre>
<p>Now select the last two rows of the matrix.</p>
<pre>
&gt; m[c(FALSE,TRUE,TRUE),]
     [,1] [,2] [,3]
[1,]    2    1    0
[2,]    3    1    1
&gt;
</pre>
<p>Now (for the punchline) try to select just the middle row of the matrix.<br />
 </p>
<pre>
&gt; m[c(FALSE,TRUE,FALSE),]
[1] 2 1 0
</pre>
<p>Notice that once again (and without warning) the result is subtly different.  I admit that it seems paranoid to worry about such small differences- but when you are debugging a system that should work these are exactly the killing mistakes you are looking for.  In this case the problem is pretty bad.  See what happens if you tried to ask for the dimension of each of these differing returns:</p>
<pre>
&gt; dim(m[c(FALSE,TRUE,TRUE),])
[1] 2 3
&gt; dim(m[c(FALSE,TRUE,FALSE),])
NULL
</pre>
<p>The first case works fine (reports 2 rows and 3 columns).  The second case returns "NULL" (instead of 1 row and 3 columns).   In R NULL is sometimes used as an error-value (instead of throwing an exception) and this value will poison any further conditions or calculations it is involved in.  The main way to deal with the arbitrary introduction of such NULLs is the incredibly tedious uncertain defensive coding practices that we argue against in <a href="http://www.win-vector.com/blog/2010/02/postels-law-not-sure-who-to-be-angry-with/">Postel’s Law: Not Sure Who To Be Angry With</a>.  Such code weakens both programs and programmers.</p>
<p>But what is going on in this example?  Once again we use the "class()" method to inspect the subtly different results.</p>
<pre>
&gt; class(m[c(FALSE,TRUE,TRUE),])
[1] "matrix"
&gt; class(m[c(FALSE,TRUE,FALSE),])
[1] "numeric"
</pre>
<p>The result is disappointing.  For a two-row select R returns a matrix (what we would expect).  For a single-row select R does us the "favor" of converting the result into a vector.  This is a disaster.  A single row matrix is similar to a vector, but even R itself does not support the same set of operations and outcomes on vectors as it does on matrices (for example the failure of the "dim()" method).  It is not safe to further calculate with these results (without by-hand converting the result back to a single row matrix which R can in fact represent).  In my case this created crashing bugs deep in a long running analysis (and was hard to diagnose as the bug was in an "innocent operation" not in a "risky calculation").</p>
<p>All of this has to violate John Chambers' "Prime Directive" for data: "an obligation on all creators of software to program in such a way that the computations can be understood and trusted."  Chambers' opinion being relevant as he is the author of the S language (of which R is an open source re-implementation).  We continue to recommend R, but we also recommend being exceptionally careful when using it (which unfortunately adds time to projects).</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/' rel='bookmark' title='Permanent Link: R examine objects tutorial'>R examine objects tutorial</a></li>
<li><a href='http://www.win-vector.com/blog/2010/01/relative-returns-a-banker-versus-trader-paradox/' rel='bookmark' title='Permanent Link: Relative returns: a banker versus trader paradox'>Relative returns: a banker versus trader paradox</a></li>
<li><a href='http://www.win-vector.com/blog/2008/04/sorting-in-anger/' rel='bookmark' title='Permanent Link: Sorting Used in Anger'>Sorting Used in Anger</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/03/r-annoyances/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Postel&#8217;s Law: Not Sure Who To Be Angry With</title>
		<link>http://www.win-vector.com/blog/2010/02/postels-law-not-sure-who-to-be-angry-with/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=postels-law-not-sure-who-to-be-angry-with</link>
		<comments>http://www.win-vector.com/blog/2010/02/postels-law-not-sure-who-to-be-angry-with/#comments</comments>
		<pubDate>Fri, 26 Feb 2010 00:38:55 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Rants]]></category>
		<category><![CDATA[Postel's Law]]></category>
		<category><![CDATA[Unit Testing]]></category>
		<category><![CDATA[Worse is Better]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1394</guid>
		<description><![CDATA[One of my research interests is finding the principles that underly the management of information, complexity and uncertainty. When something as simple as a web-form is called &#8220;technology&#8221; it is time to step back and examine your principles. One principle I am not sure about Postel&#8217;s law. It doesn&#8217;t hold often enough to be relied [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/01/map-reduce-a-good-idea/' rel='bookmark' title='Permanent Link: Map Reduce: A Good Idea'>Map Reduce: A Good Idea</a></li>
<li><a href='http://www.win-vector.com/blog/2010/03/r-annoyances/' rel='bookmark' title='Permanent Link: R annoyances'>R annoyances</a></li>
<li><a href='http://www.win-vector.com/blog/2008/10/something-i-dont-get-about-business-and-bailouts/' rel='bookmark' title='Permanent Link: Something I don&#8217;t get about business and bailouts'>Something I don&#8217;t get about business and bailouts</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>One of my research interests is finding the principles that underly the management of information, complexity and uncertainty.  When something as simple as a web-form is called &#8220;technology&#8221; it is time to step back and examine your principles.  One principle I am not sure about Postel&#8217;s law.  It doesn&#8217;t hold often enough to be relied on and when it fails I am not sure who to be angry with.<span id="more-1394"></span></p>
<p>Postel&#8217;s Law (also called The Robustness Principle) comes from RFC 761 &#8220;Transmission Control Protocol&#8221; in 1980 and is: &#8220;Be conservative in what you do; be liberal in what you accept from others.&#8221; (Side note: RFC is the now ironic acronym used to describe Internet standards- the letters stood for &#8220;request for comments&#8221;).</p>
<p>This idea probably worked best where it started- in the TCP/IP world which is where a lot of hairy details of how computers network are handled.  When your goal is the basic establishment of transient communications- success is measured by getting information through and not unnecessarily triggering a failure.  It may be okay to tolerate mistakes here- because they don&#8217;t live long anyway.</p>
<p>Unfortunately, the law works less well other places where it is applied.  The law is a downright hazard in dealing with archiving meaningful data (instead of managing transient signaling protocols). Sometimes the cost of obeying the law far outweighs the potential benefit.</p>
<p>A common arena for Postel&#8217;s law is now HTML, the markup language used to represent the content we view in web-browsers.  In this arena Postel&#8217;s law has had two consequences: one good, one bad.</p>
<p>The good: almost anyone can create a working web-page or even a web-site because modern browsers have been designed to paper around almost every common HTML mistake.  It has been pointed out that this ease of creation and &#8220;worse is better&#8221; (a deep principal due to Richard P. Gabriel see <a target='ext' href="http://en.wikipedia.org/wiki/Worse_is_better">wikipedia: worse is better</a>) has been one of the reasons that HTML out-competed and killed many other ideas.  Philip Greenspun&#8217;s famous <a target='ext' href="http://philip.greenspun.com/panda/html">story</a> of a 10-year-old building web site to get his mother medical attention happened in the sloppy world of HTML and could not have happened in the straight jacket of RDF (Resource Description Framework: the darling of the semantic web).  I would not wish having to actually read or adhere to the incredibly long and irrelevant standards from w3.org (where the ratio of value to pedantry goes to zero) on an enemy.  The web is only interesting due to its content and much of its content was only possible due to low barrier of entry.</p>
<p>The bad: to read HTML you almost have to re-create the entire history of web-browsers.  This is a history of many hostile competitors (Microsoft, Netscape, Opera, WebKit, Mozzila, Google) and billions of dollars.  Reproducing a significant fraction of this history is a significant (and useless) expense.  For the most part I use a permissive parsing library like TagSoup or HTMLTidy but even these miss some things that browsers accept and are far more complicated than the task truly justifies.</p>
<p>Even worse is the cases of XML and RDF.  These are often used for archival storage of semantic data.  That is you may need to read and understand (not just display) data in XML for a long time.  To be liberal in what you accept you have to again master a long set of useless complications (DTDs, namespaces incredibly inept character encoding and escapes) and still get burned by improperly encoded XML (that &#8220;used to work&#8221; because the bugs in the emitted XML matched the bugs in a library that is now out of date).</p>
<p>It is clear in the case of HTML and XML that Postel&#8217;s law&#8217;s cost is too high for what it delivers.  Or at least half of the law is too expensive: no amount of being generous in what we accept makes up for the original data not have been impounded correctly (not being &#8220;conservative in what they do&#8221; and not having checked that at the time it was created).  Some of this is that the producers of the data have no way of telling they are not being &#8220;conservative in what they do&#8221; because the &#8220;generous in what they accept&#8221; libraries they use to debug don&#8217;t tell them they are emitting bad data.  And lets be honest- most systems are not designed for correctness, they are instead debugged until they seem to work.  I would say that in fact HTML is not an example of the power of Postel&#8217;s law but of the pernicious influence of &#8220;worse is better.&#8221; Computer science has not risen to the level of &#8220;software engineering&#8221; we still are a horrible &#8220;fit to finish&#8221; industry.</p>
<p>Frankly for many things we need a simpler &#8220;fail early&#8221; discipline.  Tools need to be better and standards need to be simpler so that if you write something that is wrong it is easy to see why it is wrong and easy to fix it.  Postel&#8217;s law has helped hide the negative impacts of complicated standards, we need to push the cost of complications back on to standards committees.  The need to be &#8220;generous in what you accept&#8221; overly favors large, rich entrenched players who have had the time and resources in incrementally invest in papering around every common mistake.</p>
<p>However, I am not sure if we can throw out half of Postel&#8217;s law or even if we want to.  When Postel&#8217;s law fails it is not clear who to be mad at.</p>
<p>Sun, to kick somebody who is already down, was famous for making elaborate frameworks that correctly and brutally implement many details of RFCs.  Sun&#8217;s Java includes huge frameworks for XML, UTF8 and email that scrupulously implement page after page of useless standard documentation but fail in the wild due to not being &#8220;generous in what they accept.&#8221;  For example Sun&#8217;s GlassFish (which got listed named as one of four or five important assets during Sun&#8217;s various acquisition talks much like the fact the car has cup-holder somehow always gets mentioned in spec sheets) is an &#8220;open source production-quality enterprise software application server.&#8221;  A supposedly major component of the GlassFish is its email component which is a huge unwieldy framework that implements many of the email related RFCs and protocols including IMAP.  Unfortunately for all its hugeness it can not reliably read email folder names from one of the biggest IMAP servers: Google Mail.  Google Mail includes &#8220;against standard&#8221; characters in the protocol and crashes the GlassFish software.</p>
<p>And here is where Postel&#8217;s law fails us: under Postel&#8217;s law both sides are at fault (Sun for failing to be generous in what they accepted, Google for failing to be conservative in what they did).  We can&#8217;t assign only one villain.  We have no proscription of who to ask for a fix.   Postel&#8217;s law seems useful in that if either Google or Sun had followed it the two systems would work.  But the law doesn&#8217;t pick one side to assign blame and help us to efficiently diagnose and fix the problem.  It becomes difficult to find the critical bugs when they are masked by a see of &#8220;acceptable&#8221; bugs.  Take a contrary example: the simpler law &#8220;implement the standard or fix the standard&#8221; would clearly assign blame to GMail.</p>
<p>Similar pain is encountered in Java&#8217;s handling of character encodings like UTF8.  It is hard to move up the stack of artificial intelligence (from words, to concepts, to ideas, to reasoning to consciousness) when you can&#8217;t even reliably transcribe characters.  When faced with bad character sequences (a common occurrence on the web) there is no practical way to get Java &#8220;mostly parse it,&#8221; Java libraries and frameworks authors seem to extract a perverse joy in throwing a program-killing exception (it does not matter if you catch it the library has already stopped doing what you wanted) because they are concerned that a diacritical mark was not properly encoded (web browsers, on the other hand, lose the mark or show some sort of damage near the mistake and blunder on).  And here is were the frustration sets in, how can you make applications that are generous in what they accept when the libraries and frameworks are overly proud and picky?  This, at first, seems like an argument for Postel&#8217;s law- if everybody else (especially the library authors) were generous in what they accepted your life could be easy.  That is certainly one possibility- but I argue it often becomes a matter of semantics to assign blame where there is no pre-existing specification or performance agreement.  In the end you will waste more time dealing with errors that should never have made it to you than the time you save emitting the odd error of your own.</p>
<p>The unit testing people have a somewhat better idea: fail early, fail at the factory where it is cheap to fix.  Don&#8217;t   litter all of your code with indecisive statements like:</p>
<pre>
  Set<String> matches = computeMatches();
  if( matches!=null ) {
     for(String match: matches) {
         ...
     }
  }
</pre>
<p>Instead: write a unit test to document you expectation that the empty set is expressed in single consistent way:</p>
<pre>
   Set<String> matches = computeMatches();
   assertNotNull(matches);
</pre>
<p>And from then on write more confident code:</p>
<pre>
  for(String match: computeMatches()) {
         ...
  }
</pre>
<p>This may seem overly optimistic and overly strict- but I have a point.  One of the few good principles in computer science (and perhaps one of computer science&#8217;s contributions to knowledge, computers are a huge contribution to society- but they were made by engineers) is composition.  A plan for getting from A to B followed by (or composed with) a plan for getting from B to C is a plan for getting from A to C.  Well a correct plan for getting from A to B when composed with a correct plan for getting from B to C, if each of the plans &#8220;is mostly right if the piece after is so nice to fix up a few mistakes&#8221; you really don&#8217;t know what you have.  You may have nothing.</p>
<p>That is my complaint- you can&#8217;t put an a priori bound on how expensive attempting to allow both sides of Postel&#8217;s law will be.  You would like others to paper over your mistakes, but it is becoming too expensive to paper over the mistakes of others.  In the end Postel&#8217;s law is of little help when cleaning up the inevitable mess.</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/01/map-reduce-a-good-idea/' rel='bookmark' title='Permanent Link: Map Reduce: A Good Idea'>Map Reduce: A Good Idea</a></li>
<li><a href='http://www.win-vector.com/blog/2010/03/r-annoyances/' rel='bookmark' title='Permanent Link: R annoyances'>R annoyances</a></li>
<li><a href='http://www.win-vector.com/blog/2008/10/something-i-dont-get-about-business-and-bailouts/' rel='bookmark' title='Permanent Link: Something I don&#8217;t get about business and bailouts'>Something I don&#8217;t get about business and bailouts</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/02/postels-law-not-sure-who-to-be-angry-with/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Winter 2010 Subscription Campaign</title>
		<link>http://www.win-vector.com/blog/2010/01/winter-2010-subscription-campaign/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=winter-2010-subscription-campaign</link>
		<comments>http://www.win-vector.com/blog/2010/01/winter-2010-subscription-campaign/#comments</comments>
		<pubDate>Mon, 18 Jan 2010 21:57:42 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Administrativia]]></category>
		<category><![CDATA[Subscription Campaign]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1356</guid>
		<description><![CDATA[We at Win-Vector LLC would like to invite our loyal readers to help with our Winter 2010 Subscription Campaign. Please encourage your erudite friends and colleagues to read and subscribe to http://www.win-vector.com/blog/. Here are some of our most popular articles broken down by area of interest: Statistics: “I don’t think that means what you think [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2a-%e2%80%99significant%e2%80%99-doesn%e2%80%99t-always-mean-%e2%80%99important%e2%80%99/' rel='bookmark' title='Permanent Link: Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’'>Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’</a></li>
<li><a href='http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2b-calculating-significance/' rel='bookmark' title='Permanent Link: Statistics to English Translation, Part 2b: Calculating Significance'>Statistics to English Translation, Part 2b: Calculating Significance</a></li>
<li><a href='http://www.win-vector.com/blog/2008/02/hello-world-an-instance-rhetoric-in-computer-science/' rel='bookmark' title='Permanent Link: Hello World: An Instance Of Rhetoric in Computer Science'>Hello World: An Instance Of Rhetoric in Computer Science</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>We at Win-Vector LLC would like to invite our loyal readers to help with our Winter 2010 Subscription Campaign.  Please encourage your erudite friends and colleagues to read and subscribe to <a href="http://www.win-vector.com/blog/">http://www.win-vector.com/blog/</a>.<span id="more-1356"></span><br />
Here are some of our most popular articles broken down by area of interest:</p>
<ul>
<li><strong>Statistics:</strong>
<p><a href="http://www.win-vector.com/blog/2009/11/i-dont-think-that-means-what-you-think-it-means-statistics-to-english-translation-part-1-accuracy-measures/">“I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures</a></p>
<p><a href="http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2a-’significant’-doesn’t-always-mean-’important’/">Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’</a></p>
<p><a href="http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2b-calculating-significance/">Statistics to English Translation, Part 2b: Calculating Significance</a></p>
<p><a href="http://www.win-vector.com/blog/2009/08/good-graphs-graphical-perception-and-data-visualization/">Good Graphs: Graphical Perception and Data Visualization</a></p>
<p><a href="http://www.win-vector.com/blog/2009/10/google-adsense-channels-ids-and-the-cramer-rao-inequality/">Google AdSense Channels IDs and the Cramer Rao Inequality</a></p>
<p><a href="http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/"> A Demonstration of Data Mining</a></p>
</li>
<li><strong>Mathematical Finance</strong>:
<p><a href="http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/">What is the gambler’s equivalent of Amdahl’s Law?</a></p>
<p><a href="http://www.win-vector.com/blog/2008/05/betting-best-of-series/">Betting Best-Of Series</a></p>
<p><a href="http://www.win-vector.com/blog/2007/10/paper-on-stock-trading/">Automatic Generation and Testing of Un-Rolls for Profitable Technical Trades</a></p>
</li>
<li><strong>The R statistical package</strong>:
<p><a href="http://www.win-vector.com/blog/2009/09/survive-r/">Survive R</a></p>
<p><a href="http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/">R examine objects tutorial</a></p>
</li>
<li><strong>Computer Science</strong>:
<p><a href="http://www.win-vector.com/blog/2009/11/the-local-to-global-principle/">The Local to Global Principle</a></p>
<p><a href="http://www.win-vector.com/blog/2008/04/sorting-in-anger/">Sorting Used in Anger</a></p>
<p><a href="http://www.win-vector.com/blog/2009/08/on-the-hysteria-over-the-cloud/">On The Hysteria Over “The Cloud”</a></p>
</li>
<li><strong>Philosophy</strong>:
<p><a href="http://www.win-vector.com/blog/2008/08/what-is-mathematics-really/">What is Mathematics, Really?</a></p>
<p><a href="http://www.win-vector.com/blog/2008/02/hello-world-an-instance-rhetoric-in-computer-science/">Hello World: An Instance Of Rhetoric in Computer Science</a></p>
</li>
</ul>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2a-%e2%80%99significant%e2%80%99-doesn%e2%80%99t-always-mean-%e2%80%99important%e2%80%99/' rel='bookmark' title='Permanent Link: Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’'>Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’</a></li>
<li><a href='http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2b-calculating-significance/' rel='bookmark' title='Permanent Link: Statistics to English Translation, Part 2b: Calculating Significance'>Statistics to English Translation, Part 2b: Calculating Significance</a></li>
<li><a href='http://www.win-vector.com/blog/2008/02/hello-world-an-instance-rhetoric-in-computer-science/' rel='bookmark' title='Permanent Link: Hello World: An Instance Of Rhetoric in Computer Science'>Hello World: An Instance Of Rhetoric in Computer Science</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/01/winter-2010-subscription-campaign/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;Easy&#8221; Portfolio Allocation</title>
		<link>http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=easy-portfolio-allocation</link>
		<comments>http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/#comments</comments>
		<pubDate>Thu, 14 Jan 2010 20:09:13 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Finance]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[Lagrange Multipliers]]></category>
		<category><![CDATA[Mathematical Bedside Reading]]></category>
		<category><![CDATA[Portfolio Theory]]></category>
		<category><![CDATA[Sharpe Ratio]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1342</guid>
		<description><![CDATA[This is an elementary mathematical finance article. This means if you know some math (linear algebra, differential calculus) you can find a quick solution to a simple finance question. The topic was inspired by a recent article in The American Mathematical Monthly (Volume 117, Number 1 January 2010, pp. 3-26): &#8220;Find Good Bets in the [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
<li><a href='http://www.win-vector.com/blog/2009/09/a-discrete-model-gauging-market-efficiency/' rel='bookmark' title='Permanent Link: A Discrete Model Gauging Market Efficiency'>A Discrete Model Gauging Market Efficiency</a></li>
<li><a href='http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/' rel='bookmark' title='Permanent Link: What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?'>What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>This is an elementary mathematical finance article. This means if you know some math (linear algebra, differential calculus) you can find a quick solution to a simple finance question. The topic was inspired by a recent article in The American Mathematical Monthly (Volume 117, Number 1 January 2010, pp. 3-26): &#8220;Find Good Bets in the Lottery, and Why You Shouldn&#8217;t Take Them&#8221; by Aaron Abrams and Skip Garibaldi which said optimal asset allocation is now an undergraduate exercise. That may well be, but there are a lot of people with very deep mathematical backgrounds that have yet to have seen this. We will fill in the details here. The style is terse, but the content should be about what you would expect from one day of lecture in a mathematical finance course.</p>
<p><span id="more-1342"></span></p>
<p>Portfolio allocation is not the &#8220;magic predict the future&#8221; part of finance, it is the scheme for correctly applying magic predictions of the future. The idea is that if you had an prediction of future returns of a number of assets, the naive thing to do would be to invest everything into the asset with highest predicted return. Portfolio theory, while still taking the predictions at face value, picks an investment pattern that will (in risk-adjusted dollars) outperform the naive strategy even if the predictions are correct and is a bit safer when the predictions are wrong.</p>
<p>Suppose you had <img width="14" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg1.png" alt="$ n$"> different assets you could invest in. For the <img width="10" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg2.png" alt="$ i$"> -th asset there is an expected excess relative return of <img width="19" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg3.png" alt="$ \mu_i$"> and an estimated variance of <img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg4.png" alt="$ s_i$"> (for a definition of relative return see <a href="http://www.win-vector.com/blog/2010/01/relative-returns-a-banker-versus-trader-paradox/">Relative returns: a banker versus trader paradox</a> and for a definition of variance see <a href="http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/">A Quick Appreciation of the Sharpe Ratio</a>). Let the vector <img width="16" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg5.png" alt="$ w$"> be such that <img width="23" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg6.png" alt="$ X_i$"> represents the number of dollars we invest in the <img width="10" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg2.png" alt="$ i$"> -th asset. If <img width="23" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg6.png" alt="$ X_i$"> is positive then our plan is &#8220;to go long&#8221; or buy some of the <img width="10" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg2.png" alt="$ i$"> -th asset. If <img width="23" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg6.png" alt="$ X_i$"> is negative our plan is &#8220;to short&#8221; or sell some of the <img width="10" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg2.png" alt="$ i$"> -th asset to somebody else (It is called going short as we actually sell something we do not have. This is often allowed in finance; as long as we make the same pay-outs to the buyer that the buyer would receive if we really had the item to sell).</p>
<p>When we appeal to the idea of optimizing the portfolio Sharpe Ratio (again, see <a href="http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/">A Quick Appreciation of the Sharpe Ratio</a>) then we say a good portfolio is one that doesn&#8217;t just maximize expected relative returns (which is <img width="39" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg11.png" alt="$ X^{\top} \mu$"> ) but maximizes the ratio of expected relative return to standard deviation:</p>
</p>
<div align="center"><img width="73" height="56" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg8.png" alt="$\displaystyle \frac{X^{\top} \mu}{\sqrt{X^{\top} C X}} $"></div>
<p>where (for now) <img width="17" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg9.png" alt="$ C$"> is the matrix <img width="30" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg10.png" alt="$ s s^{\top}$"> . This ratio is called a &#8220;risk adjusted return&#8221; (versus the un-adjusted form <img width="39" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg11.png" alt="$ X^{\top} \mu$"> ). Also notice that the ratio is homogeneous in <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> (doubling <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> does not change the ratio as it simultaneously doubles the numerator and the denominator) so an optimal solution <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> describes not how much to invest, but what pattern to invest in. This allows us to introduce an important practical constraint: we are only going to allow ourselves to risk a total of <img width="16" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg13.png" alt="$ T$"> dollars (both long and short). That is: we insist <img width="105" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg14.png" alt="$ \sum_{i=1}^{n} \vert X_i\vert = T$"> . We will ignore this total investment constraint until the end when we can satisfy the constraint by simply re-scaling an partial solution.</p>
<p>To solve for <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> we introduce an old friend: <a href="http://en.wikipedia.org/wiki/Lagrange_multipliers">Lagrange Multipliers</a> (or equivalently the Karush-Kuhn-Tucker conditions of optimality). Since the fraction we are trying to optimize is homogeneous in <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> we can convert the denominator into a constraint and arbitrarily insist that <img width="99" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg15.png" alt="$ \sqrt{X^{\top} C X} = 1$"> without changing the nature of the problem. We are now trying to maximize <img width="39" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg11.png" alt="$ X^{\top} \mu$"> subject to <img width="99" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg15.png" alt="$ \sqrt{X^{\top} C X} = 1$"> . The Lagrangian conditions of optimality state at the optimum we must have the gradient of the objective is proportional to the gradient of the constraint or:</p>
</p>
<div align="center"><img width="225" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg16.png" alt="$\displaystyle \nabla_X X^{\top} \mu = \lambda \nabla_X ( \sqrt{X^{\top} C X} - 1 ) $"></div>
<p>for some (to be determined) constant <img width="13" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg17.png" alt="$ \lambda$"> . Pushing the gradient operator through we get:</p>
<div align="center"><img width="213" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg18.png" alt="$\displaystyle \mu = \lambda (1/2) ( X^{\top} C X )^{-1/2} 2 C X . $"></div>
<p>A similar equation could be gotten by appealing to a Rayleigh Quotient argument.</p>
<p>We do not yet know <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> (that is what we are trying to solve for), so we do not know what <img width="56" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg19.png" alt="$ X^{\top} C X$"> is. However, this is just a scalar and since we are just trying to solve up to a multiple we can throw it out and introduce a new multiple and see that it is enough to solve:</p>
</p>
<div align="center"><img width="76" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg20.png" alt="$\displaystyle \mu = \lambda' C X $"></div>
<p>where <img width="18" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg21.png" alt="$ \lambda'$"> is new (still unknown) scalar. This means we have:</p>
<div align="center"><img width="121" height="35" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg22.png" alt="$\displaystyle X = (1/\lambda') C^{-1} \mu $"></div>
<p>so our desired solution is some re-scaling of <img width="43" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg23.png" alt="$ C^{-1} \mu$"> .</p>
<p>As we stated earlier we have a total investment constraint of <img width="105" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg14.png" alt="$ \sum_{i=1}^{n} \vert X_i\vert = T$"> . We can achieve this with the following adjusted solution:</p>
</p>
<div align="center"><img width="189" height="51" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg24.png" alt="$\displaystyle X = \frac{T}{\sum_{i=1}^{n} \vert(C^{-1} \mu)_i\vert} C^{-1} \mu $"></div>
<p>as our desired optimal portfolio allocation. In the end we can solve for the optimal portfolio by merely solving a linear system (we don&#8217;t need anything as expensive as a general purpose optimizer in this case).</p>
<p>These are very old results (going back as long as there has been Sharpe Ratios and portfolio theory). A good example reference is: &#8220;The Valuation of Risk Assets and the Selection of Risky Investments in Stock Portfolios and Capital Budgets,&#8221; John Lintner, The Review of Economics and Statistics (1965) vol. 47 (1) pp. 13-37. These results are the basis for advice like: &#8220;diversify.&#8221; Without modeling risk you would tend to put all of your money in the predicted highest paying asset. When modeling risk you tend to put some of your money in each high paying asset and as long as they do not all fail at the same time you have some safety. Another (very different) route to diversification is the Kelly Criterion (discussed in <a href="http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/">What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</a>).</p>
<p>A very important risk we have not yet modeled is that our assets may have a tendency to fail at the same time (meaning we may not have really diversified usefully). The notion of assets may fail at the same time brings us to the ideas of correlation and covariance. When we took <img width="64" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg25.png" alt="$ C = s s^{\top}$"> we were implicitly assuming (or modeling), without justification, that each possible asset was independent of all the others (that there was no correlation between asset returns). This is, of course, not going to be anywhere near true in practice. Instead we should take <img width="17" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg9.png" alt="$ C$"> to be the <a href="http://en.wikipedia.org/wiki/Covariance_matrix">Covariance Matrix</a> that represent our estimate of the assent to asset correlations. In this case the solution methods above all work exactly as before. Companies such as MSCI Barra have made complete businesses out of producing and selling estimates of <img width="17" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg9.png" alt="$ C$"> .</p>
<p>Another issue is when we do not allow ourselves to &#8220;short&#8221; (or take a negative allocation of) assets. In this case we have the additional constraints <img width="48" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg26.png" alt="$ X \ge 0$"> which complicates our solution. For the special case where the asset variances are assumed to be independent (i.e. <img width="64" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg25.png" alt="$ C = s s^{\top}$"> ) it is enough to solve as above and merely replace any negative allocations with zero when inspecting and scaling the final step of the solution. When the covariances are non-trivial (<img width="17" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg9.png" alt="$ C$"> has non-zero off-diagonal entries) this solution may not be optimal. In this case the Karush-Kuhn-Tucker conditions are more complicated and at the point of optimal solution we have the following conditions:</p>
<p></p>
<div align="center">
<table cellpadding="0" align="center">
<tr valign="middle">
<td nowrap align="right"><img width="145" height="60" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg27.png" alt="$\displaystyle \mu + \lambda C X - \sum_{i=1}^{n} \tau_i E^i$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg28.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="19" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg29.png" alt="$\displaystyle X$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg30.png" alt="$\displaystyle \ge$"></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="48" height="60" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg31.png" alt="$\displaystyle \sum_{i=1}^{n} X_i$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg28.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap><img width="16" height="29" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg32.png" alt="$\displaystyle T$"></td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="13" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg33.png" alt="$\displaystyle \tau$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg30.png" alt="$\displaystyle \ge$"></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
<tr valign="middle">
<td nowrap align="right"><img width="38" height="36" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg34.png" alt="$\displaystyle \tau^{\top} X$"></td>
<td width="10" align="center" nowrap><img width="17" height="28" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg28.png" alt="$\displaystyle =$"></td>
<td align="left" nowrap>0</td>
<td width="10" align="right">&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"><br />
where <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> is the allocation vector we wish to solve for, <img width="13" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg17.png" alt="$ \lambda$"> is an unknown scalar, <img width="13" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg35.png" alt="$ \tau$"> is a new unknown vector and <img width="22" height="16" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg36.png" alt="$ E^i$"> is the vector with <img width="69" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg37.png" alt="$ (E^i)_i = 1$"> and zeroes elsewhere. Using the Karush-Kuhn-Tucker conditions has allowed us to again almost linearize the problem, but we know have sign constraints on <img width="19" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg12.png" alt="$ X$"> and <img width="13" height="14" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg35.png" alt="$ \tau$"> and what is called a complementarity constraint: <img width="67" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg38.png" alt="$ \tau^{\top} X = 0$"> . This sort of problem essentially called a &#8220;Linear Complementarity Problem&#8221; and is about as hard as solving a linear program (the typical solution method is a variation of the simplex method called &#8220;Lemke&#8217;s algorithm&#8221;). (Technically the <img width="13" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg17.png" alt="$ \lambda$"> prevents the problem from being in the right form, but <img width="13" height="15" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2010/01/EPAimg17.png" alt="$ \lambda$"> can be inspected out of the problem.) The problem can still be solved, you just need a bit more software. If we can not short assets (or at least simulate shorting assets) we not only eliminate many possible portfolios from consideration (so we likely end up with a less profitable portfolio than we would like) we also make the mathematics and computation a bit harder.</p>
<p>The goal of this writeup has been to show how to systematically convert investment advice like &#8220;this stock is going to really take off&#8221; into an allocation of assets (which in turn implies a pattern of trades). We take as unexamined premises where to get such advice and whether to use the Sharpe ratio or some other notion of risk and/or utility. The point is that even though it may be complicated, from this point it is just calculation and calculation is easy to automate.</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
<li><a href='http://www.win-vector.com/blog/2009/09/a-discrete-model-gauging-market-efficiency/' rel='bookmark' title='Permanent Link: A Discrete Model Gauging Market Efficiency'>A Discrete Model Gauging Market Efficiency</a></li>
<li><a href='http://www.win-vector.com/blog/2009/10/what-is-the-gamblers-equivalent-of-amdahls-law/' rel='bookmark' title='Permanent Link: What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?'>What is the gambler&#8217;s equivalent of Amdahl&#8217;s Law?</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
