<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Win-Vector Blog &#187; Coding</title>
	<atom:link href="http://www.win-vector.com/blog/category/coding/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.win-vector.com/blog</link>
	<description>The Applied Theorist&#039;s Point of View</description>
	<lastBuildDate>Thu, 29 Jul 2010 17:09:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Gradients via Reverse Accumulation</title>
		<link>http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=gradients-via-reverse-accumulation</link>
		<comments>http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/#comments</comments>
		<pubDate>Thu, 15 Jul 2010 00:00:04 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[Exciting Techniques]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[Automatic Differentiation]]></category>
		<category><![CDATA[Conjugate Gradient]]></category>
		<category><![CDATA[Gradient]]></category>
		<category><![CDATA[Mathematical Bedside Reading]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Reverse Accumulation]]></category>
		<category><![CDATA[Scala]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1493</guid>
		<description><![CDATA[We extend the ideas of from Automatic Differentiation with Scala to include the reverse accumulation. Reverse accumulation is a non-obvious improvement to automatic differentiation that can in many cases vastly speed up calculations of gradients. As the tables, diagrams and equations do not translate well into HTML, our full article is available here in PDF: [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/' rel='bookmark' title='Permanent Link: Automatic Differentiation with Scala'>Automatic Differentiation with Scala</a></li>
<li><a href='http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/' rel='bookmark' title='Permanent Link: &#8220;Easy&#8221; Portfolio Allocation'>&#8220;Easy&#8221; Portfolio Allocation</a></li>
<li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>We extend the ideas of from <a target="ext" href="http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/">Automatic Differentiation with Scala</a> to include the <em>reverse accumulation</em>.  Reverse accumulation is a non-obvious improvement to automatic differentiation that can in many cases vastly speed up calculations of gradients.<span id="more-1493"></span><br />
As the tables, diagrams and equations do not translate well into HTML, our full article is available here in PDF: <a href="http://www.win-vector.com/dfiles/ReverseAccumulation.pdf">http://www.win-vector.com/dfiles/ReverseAccumulation.pdf</a>.</p>
<p>The purpose of our article is to explain reverse accumulation automatic differentiation clearly (and to release some sample code and timing results).  A side effect of the article is to make sense of the following two diagrams:</p>
<p>If the following is picture of standard or forward differentiation:</p>
<p><img src="http://www.win-vector.com/blog/wp-content/uploads/2010/07/cutFwd.png" alt="cutFwd.png" border="0" width="408" height="677" /></p>
<p>then the following is a picture of reverse accumulation:</p>
<p><img src="http://www.win-vector.com/blog/wp-content/uploads/2010/07/cutRev.png" alt="cutRev.png" border="0" width="487" height="739" /></p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/' rel='bookmark' title='Permanent Link: Automatic Differentiation with Scala'>Automatic Differentiation with Scala</a></li>
<li><a href='http://www.win-vector.com/blog/2010/01/easy-portfolio-allocation/' rel='bookmark' title='Permanent Link: &#8220;Easy&#8221; Portfolio Allocation'>&#8220;Easy&#8221; Portfolio Allocation</a></li>
<li><a href='http://www.win-vector.com/blog/2008/09/a-quick-appreciation-of-the-sharpe-ratio/' rel='bookmark' title='Permanent Link: A Quick Appreciation of the Sharpe Ratio'>A Quick Appreciation of the Sharpe Ratio</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automatic Differentiation with Scala</title>
		<link>http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=automatic-differentiation-with-scala</link>
		<comments>http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/#comments</comments>
		<pubDate>Tue, 15 Jun 2010 04:19:20 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Coding]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Exciting Techniques]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[Automatic Differentiation]]></category>
		<category><![CDATA[Conjugate Gradient]]></category>
		<category><![CDATA[Dual Numbers]]></category>
		<category><![CDATA[Geometric Median]]></category>
		<category><![CDATA[Numeric Methods]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Scala]]></category>
		<category><![CDATA[Steiner Tree]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1481</guid>
		<description><![CDATA[This article is a worked-out exercise in applying the Scala type system to solve a small scale optimization problem. For this article we supply complete Scala source code (under a GPLv3 license) and some design discussion. Usually we work using a combination of databases, Java, optimization libraries and analysis suites (like R). The reason is [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/' rel='bookmark' title='Permanent Link: Gradients via Reverse Accumulation'>Gradients via Reverse Accumulation</a></li>
<li><a href='http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/' rel='bookmark' title='Permanent Link: R examine objects tutorial'>R examine objects tutorial</a></li>
<li><a href='http://www.win-vector.com/blog/2009/09/survive-r/' rel='bookmark' title='Permanent Link: Survive R'>Survive R</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>This article is a worked-out exercise in applying the <a href="http://www.scala-lang.org/" target="ext">Scala</a> type system to solve a small scale optimization problem.    For this article we supply <a href="http://www.win-vector.com/dfiles/ScalaDiff.jar">complete Scala source code</a> (under a GPLv3 license) and some design discussion.<span id="more-1481"></span><br />
Usually we work using a combination of databases, Java, optimization libraries and analysis suites (like R).  The reason is that, for our typical problems, Java hits a sweet spot of trading off runtime performance against ease of development and maintenance.  In the tens of gigabytes range (data sets larger than the Wikipedia but smaller than the Web) Java outperforms the scripting languages (Ruby, Python &#8230;) and is much easer to develop in and document than C++.  This sweet spot is both subjective and situational- if the tasks were smaller and in a services framework Python is a better choice, if performance is paramount then C or C++ (with the STL) and Hadoop are a better choice, if pre-built statistical libraries are needed then R becomes a better choice.  For the type problem we present here Scala is a very good choice.</p>
<style type="text/css">
td.linenos { background-color: #f0f0f0; padding-right: 10px; }
span.lineno { background-color: #f0f0f0; padding: 0 5px 0 5px; }
pre { line-height: 125%; }
body .hll { background-color: #ffffcc }
body  { background: #f8f8f8; }
body .c { color: #408080; font-style: italic } /* Comment */
body .err { border: 1px solid #FF0000 } /* Error */
body .k { color: #008000; font-weight: bold } /* Keyword */
body .o { color: #666666 } /* Operator */
body .cm { color: #408080; font-style: italic } /* Comment.Multiline */
body .cp { color: #BC7A00 } /* Comment.Preproc */
body .c1 { color: #408080; font-style: italic } /* Comment.Single */
body .cs { color: #408080; font-style: italic } /* Comment.Special */
body .gd { color: #A00000 } /* Generic.Deleted */
body .ge { font-style: italic } /* Generic.Emph */
body .gr { color: #FF0000 } /* Generic.Error */
body .gh { color: #000080; font-weight: bold } /* Generic.Heading */
body .gi { color: #00A000 } /* Generic.Inserted */
body .go { color: #808080 } /* Generic.Output */
body .gp { color: #000080; font-weight: bold } /* Generic.Prompt */
body .gs { font-weight: bold } /* Generic.Strong */
body .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
body .gt { color: #0040D0 } /* Generic.Traceback */
body .kc { color: #008000; font-weight: bold } /* Keyword.Constant */
body .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */
body .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */
body .kp { color: #008000 } /* Keyword.Pseudo */
body .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */
body .kt { color: #B00040 } /* Keyword.Type */
body .m { color: #666666 } /* Literal.Number */
body .s { color: #BA2121 } /* Literal.String */
body .na { color: #7D9029 } /* Name.Attribute */
body .nb { color: #008000 } /* Name.Builtin */
body .nc { color: #0000FF; font-weight: bold } /* Name.Class */
body .no { color: #880000 } /* Name.Constant */
body .nd { color: #AA22FF } /* Name.Decorator */
body .ni { color: #999999; font-weight: bold } /* Name.Entity */
body .ne { color: #D2413A; font-weight: bold } /* Name.Exception */
body .nf { color: #0000FF } /* Name.Function */
body .nl { color: #A0A000 } /* Name.Label */
body .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
body .nt { color: #008000; font-weight: bold } /* Name.Tag */
body .nv { color: #19177C } /* Name.Variable */
body .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */
body .w { color: #bbbbbb } /* Text.Whitespace */
body .mf { color: #666666 } /* Literal.Number.Float */
body .mh { color: #666666 } /* Literal.Number.Hex */
body .mi { color: #666666 } /* Literal.Number.Integer */
body .mo { color: #666666 } /* Literal.Number.Oct */
body .sb { color: #BA2121 } /* Literal.String.Backtick */
body .sc { color: #BA2121 } /* Literal.String.Char */
body .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */
body .s2 { color: #BA2121 } /* Literal.String.Double */
body .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
body .sh { color: #BA2121 } /* Literal.String.Heredoc */
body .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */
body .sx { color: #008000 } /* Literal.String.Other */
body .sr { color: #BB6688 } /* Literal.String.Regex */
body .s1 { color: #BA2121 } /* Literal.String.Single */
body .ss { color: #19177C } /* Literal.String.Symbol */
body .bp { color: #008000 } /* Name.Builtin.Pseudo */
body .vc { color: #19177C } /* Name.Variable.Class */
body .vg { color: #19177C } /* Name.Variable.Global */
body .vi { color: #19177C } /* Name.Variable.Instance */
body .il { color: #666666 } /* Literal.Number.Integer.Long */
 </style>
<h2>Our Example Problem</h2>
<p>Our small scale problem is this:  we have a number of target points on a map and we want to pick a central point to <em>directly</em> connect to all of these points with wire.  Our goal is to minimize the total amount of wire used.  This problem is called the <a href="http://en.wikipedia.org/wiki/Geometric_median" ref="ext">&#8220;Geometric Median&#8221;</a>.  So we are trying to find a point that minimizes the sum of distances from our chosen center to every target point. If we were trying to minimize the sum of squared distances from our chosen center to every target point the answer would be obvious: the average or mean (which by Hooke&#8217;s law is also the point where a set of identical springs would relax to).  The mean is in fact a fairly good guess, but you can do better (which could important if the &#8220;wire&#8221; is expensive, such as cutting irrigation or drainage ditches).  For example given the three target points (20,0), (-1,-1) and (-1,1) the optimal point is (-0.42,0) not the mean (6,0) and the choice of optimal point represents an over 19% savings in total wiring distance (see figure).</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/points.png" alt="points.png" border="0" width="525" height="525" /><br />
</center></p>
<p>This is a substantial saving in cost.  </p>
<p>The problem changes as we consider variations.  If indirect connections (such as routing one point through another, which may or may not be possible for reasons of capacity or safety) and multiple new centers are allowed  we then have an instance of the <a href="http://en.wikipedia.org/wiki/Steiner_tree_problem" ref="ext">Steiner Tree Problem</a> which is harder  to solve (since it is known to be NP complete).  If no new centers are allowed (all routing must be between pre-existing target points) then we have a Spanning Tree Problem- which admits very quick solutions.</p>
<p>We bring up the geometric median as a mere example.  We don&#8217;t intend for our code to solve only the geometric median problem and we don&#8217;t intend to touch on the literature of specialized methods for solving the geometric median problem.  Instead we are trying to demonstrate the speed you can develop prototype solutions if you have a few good tools (like various optimizers) available in your toolkit.  Numeric optimizers may sound exotic, but they often are the kind of thing you want to experiment with and link directly into your code.</p>
<h2>Optimization as General Tool</h2>
<p>Now that we have the example problem we can describe a solution strategy.  In this case the solution uses code &#8220;we wished we had lying around&#8221; before we started on the problem.  We will pretend we have the tools we want ready to solve our problem and then we will pay our debt and build the required tools.  The issue is that there is not an obvious closed form for the solution of the geometric median problem.  So we are forced to work a bit harder.  In this case harder means we need to solve an optimization problem.  Consider the contour plot of the total wiring cost as function of where we choose to place our center.  Our optimal point (-0.42,0) had wiring cost of 22.73 and the contour plot given here shows concentric regions of solution positions with higher cost.</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/contour.png" alt="contour.png" border="0" width="525" height="525" /><br />
</center></p>
<p>In general it is unwise to throw an optimizer at an arbitrary problem and hope to find the globally best solution.  But in this case (and in many similar situations) we can prove that a simple local optimizer will in fact find the unique best solution.  This is a property of the problem not of the optimizer.  The concentric regions shown in the contour plot have a very nice shape: they are <a href="http://en.wikipedia.org/wiki/Convex_set" ref="ext">convex</a>.   That is: they have no intrusions- for any two points drawn from one of these shapes the straight line segment between these points stays inside the given shape.  We don&#8217;t have to depend on observation- we can actually prove this is always the case for this problem.  The wiring cost from a proposed center to any single target point is a <a href="http://en.wikipedia.org/wiki/Convex_function" ref="ext">convex function</a> of where we choose to place our center (a convex function is a function whose graph never reaches above the secant line drawn between any two points on its graph).  The total wiring cost is just the sum of the wiring costs to each target point.  And to finish: the sum of a collection of convex functions is itself a convex function.  Since the contour plot of a convex function has only convex shapes and we have proven the statement.</p>
<p>But how does this help us?  There is a standard technique to find &#8220;local minima&#8221; of a function by inspecting a function for places where the gradient is zero (points where there is no obvious down hill direction on the contour plot).  This technique usually can only be guaranteed to find local minima (places where no small change improves your situation).  But there is no guarantee that the local minimum you find is in fact the global minimum (the best possible solution).  Except when you are dealing with a convex function.  When a function is convex then all of the local minima are always grouped together into a single convex connected shape (if not a line drawn between two remote minima would violate the convexity definition).  And if the function is never flat then this set is a single unique point: the unique best solution.  Our inspection technique will be a gradient driven optimizer- that is an optimizer that when the gradient is non-zero improves its objective by running down hill and halts when the gradient is zero.</p>
<p>The stated function to minimize is to sum the distance from our proposed center to each target point.  We can write this as the sum of the distances:</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/dist1.png" alt="dist1.png" border="0" width="309" height="81" /><br />
</center></p>
<p>( <img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/euclid1.png" alt="euclid1.png" border="0" width="119" height="37" /> which is the traditional Euclidean or L2 distance).  This function actually has one one subtle flaw that we will deal with in the appendix (see: Fixing Smoothness).</p>
<h2>Using Scala to Apply the Optimization Solution</h2>
<p>To find our optimal center placement using Scala we first write our cost or objective as a Scala function:</p>
<div class="highlight">
<pre>    <span class="k">val</span> <span class="n">dat</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">]]</span> <span class="o">=</span> <span class="nc">Array</span><span class="o">(</span>
      <span class="nc">Array</span><span class="o">(</span> <span class="mi">20</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
      <span class="nc">Array</span><span class="o">(</span> <span class="o">-</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
      <span class="nc">Array</span><span class="o">(</span> <span class="o">-</span><span class="mf">1.0</span><span class="o">,</span> <span class="o">-</span><span class="mf">1.0</span><span class="o">)</span>
    <span class="o">)</span>

    <span class="k">def</span> <span class="n">fx</span><span class="o">(</span><span class="n">p</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">])</span><span class="k">:</span><span class="kt">Double</span> <span class="o">=</span> <span class="o">{</span>
      <span class="k">val</span> <span class="n">dim</span> <span class="k">=</span> <span class="n">p</span><span class="o">.</span><span class="n">length</span>
      <span class="k">val</span> <span class="n">npoint</span> <span class="k">=</span> <span class="n">dat</span><span class="o">.</span><span class="n">length</span>
      <span class="k">var</span> <span class="n">total</span> <span class="k">=</span> <span class="mf">0.0</span>
      <span class="k">for</span><span class="o">(</span><span class="n">k</span> <span class="k">&lt;-</span> <span class="mi">0</span> <span class="n">to</span> <span class="o">(</span><span class="n">npoint</span><span class="o">-</span><span class="mi">1</span><span class="o">))</span> <span class="o">{</span>
        <span class="k">var</span> <span class="n">term</span> <span class="k">=</span> <span class="mf">0.0</span>
        <span class="k">for</span><span class="o">(</span><span class="n">i</span> <span class="k">&lt;-</span> <span class="mi">0</span> <span class="n">to</span> <span class="o">(</span><span class="n">dim</span><span class="o">-</span><span class="mi">1</span><span class="o">))</span> <span class="o">{</span>
          <span class="k">val</span> <span class="n">diff</span> <span class="k">=</span> <span class="n">p</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="o">-</span> <span class="n">dat</span><span class="o">(</span><span class="n">k</span><span class="o">)(</span><span class="n">i</span><span class="o">)</span>
          <span class="n">term</span> <span class="k">=</span> <span class="n">term</span> <span class="o">+</span> <span class="n">diff</span><span class="o">*</span><span class="n">diff</span>
        <span class="o">}</span>
        <span class="n">total</span> <span class="k">=</span> <span class="n">total</span> <span class="o">+</span> <span class="n">scala</span><span class="o">.</span><span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="o">(</span><span class="n">term</span><span class="o">)</span>
      <span class="o">}</span>
      <span class="n">total</span>
    <span class="o">}</span>
</pre>
</div>
<p>Scala is succinct and it is a great connivence to have a function definition capture data from its environment.   What we would like to do is generate an initial guess as the solution (we use the mean as our initial guess) and then call an optimizer (in this case a conjugate gradient optimizer) to do all the work:</p>
<div class="highlight">
<pre> <span class="k">val</span> <span class="n">p0</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">]</span> <span class="o">=</span> <span class="n">mean</span><span class="o">(</span><span class="n">dat</span><span class="o">)</span>
 <span class="k">val</span> <span class="o">(</span><span class="n">pF</span><span class="o">,</span><span class="n">fpF</span><span class="o">)</span> <span class="k">=</span> <span class="nc">CG</span><span class="o">.</span><span class="n">minimize</span><span class="o">(</span><span class="n">fx</span><span class="o">,</span><span class="n">p0</span><span class="o">)</span>
</pre>
</div>
<p>At this point we would be done, except the conjugate gradient method (which is superior to gradient descent and many the non-gradient methods) requires a gradient.<br />
We could provide a numeric estimate of the gradient by the following divided difference method:</p>
<div class="highlight">
<pre>  <span class="k">def</span> <span class="n">gradientD</span><span class="o">(</span><span class="n">f</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">]</span><span class="k">=&gt;</span><span class="kt">Double</span><span class="o">,</span><span class="n">p</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">])</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">]</span> <span class="o">=</span> <span class="o">{</span>
    <span class="k">val</span> <span class="n">xdim</span> <span class="k">=</span> <span class="n">p</span><span class="o">.</span><span class="n">length</span>
    <span class="k">val</span> <span class="n">p2</span> <span class="k">=</span> <span class="n">copy</span><span class="o">(</span><span class="n">p</span><span class="o">)</span>
    <span class="k">val</span> <span class="n">base</span> <span class="k">=</span> <span class="n">f</span><span class="o">(</span><span class="n">p2</span><span class="o">)</span>
    <span class="k">val</span> <span class="n">ret</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">](</span><span class="n">xdim</span><span class="o">)</span>
    <span class="k">val</span> <span class="n">delta</span> <span class="k">=</span> <span class="mf">1.0e-6</span>
    <span class="k">for</span><span class="o">(</span><span class="n">i</span> <span class="k">&lt;-</span> <span class="mi">0</span> <span class="n">to</span> <span class="o">(</span><span class="n">xdim</span><span class="o">-</span><span class="mi">1</span><span class="o">))</span> <span class="o">{</span>
      <span class="n">p2</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="k">=</span> <span class="n">p</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="o">+</span> <span class="n">delta</span>
      <span class="k">val</span> <span class="n">fplus</span> <span class="k">=</span> <span class="n">f</span><span class="o">(</span><span class="n">p2</span><span class="o">)</span>
      <span class="n">p2</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="k">=</span> <span class="n">p</span><span class="o">(</span><span class="n">i</span><span class="o">)</span>
      <span class="k">val</span> <span class="n">diff</span> <span class="k">=</span> <span class="o">(</span><span class="n">fplus</span><span class="o">-</span><span class="n">base</span><span class="o">)/</span><span class="n">delta</span>
      <span class="n">ret</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="k">=</span> <span class="n">diff</span>
    <span class="o">}</span>
    <span class="n">ret</span>
  <span class="o">}</span>
</pre>
</div>
<p>This numeric divided difference method often outperforms non-derivative optimization methods (like Powell&#8217;s Method and the Nelder-Mead Amoeba method).  But the technique can run into numeric difficulties.   We can remedy this if we are willing to write our function in a slightly more general way.   If we re-encode our function in a generic manner we can use <a href="http://en.wikipedia.org/wiki/Automatic_differentiation" target="ext">automatic differentiation</a>  (not to be confused with numeric differentiation or with symbolic differentiation) to produce a reliable gradient for optimization.  What we need to do is re-write our function to work over an abstract field of numbers instead of only the machine supplied doubles.  In fact what we need to do is specify a generic function that will work over any field, with the field to be determined later.  The code to do this in Scala is very similar to the non-generic code:</p>
<div class="highlight">
<pre>   <span class="k">val</span> <span class="n">genericFx</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">VectorFN</span> <span class="o">{</span>
      <span class="k">def</span> <span class="n">apply</span><span class="o">[</span><span class="kt">Y</span> <span class="k">&lt;:</span> <span class="kt">NumberBase</span><span class="o">[</span><span class="kt">Y</span><span class="o">]](</span><span class="n">p</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Y</span><span class="o">])</span><span class="k">:</span><span class="kt">Y</span> <span class="o">=</span> <span class="o">{</span>
        <span class="k">val</span> <span class="n">field</span> <span class="k">=</span> <span class="n">p</span><span class="o">(</span><span class="mi">0</span><span class="o">).</span><span class="n">field</span>
        <span class="k">val</span> <span class="n">dim</span> <span class="k">=</span> <span class="n">p</span><span class="o">.</span><span class="n">length</span>
        <span class="k">val</span> <span class="n">npoint</span> <span class="k">=</span> <span class="n">dat</span><span class="o">.</span><span class="n">length</span>
        <span class="k">var</span> <span class="n">total</span> <span class="k">=</span> <span class="n">field</span><span class="o">.</span><span class="n">zero</span>
        <span class="k">for</span><span class="o">(</span><span class="n">k</span> <span class="k">&lt;-</span> <span class="mi">0</span> <span class="n">to</span> <span class="o">(</span><span class="n">npoint</span><span class="o">-</span><span class="mi">1</span><span class="o">))</span> <span class="o">{</span>
          <span class="k">var</span> <span class="n">term</span> <span class="k">=</span> <span class="n">field</span><span class="o">.</span><span class="n">zero</span>
          <span class="k">for</span><span class="o">(</span><span class="n">i</span> <span class="k">&lt;-</span> <span class="mi">0</span> <span class="n">to</span> <span class="o">(</span><span class="n">dim</span><span class="o">-</span><span class="mi">1</span><span class="o">))</span> <span class="o">{</span>
            <span class="k">val</span> <span class="n">diff</span> <span class="k">=</span> <span class="n">p</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="o">-</span> <span class="n">field</span><span class="o">.</span><span class="n">inject</span><span class="o">(</span><span class="n">dat</span><span class="o">(</span><span class="n">k</span><span class="o">)(</span><span class="n">i</span><span class="o">))</span>
            <span class="n">term</span> <span class="k">=</span> <span class="n">term</span> <span class="o">+</span> <span class="n">diff</span><span class="o">*</span><span class="n">diff</span>
          <span class="o">}</span>
          <span class="n">total</span> <span class="k">=</span> <span class="n">total</span> <span class="o">+</span> <span class="n">smoothSQRT</span><span class="o">(</span><span class="n">term</span><span class="o">)</span>
        <span class="o">}</span>
        <span class="n">total</span>
      <span class="o">}</span>
    <span class="o">}</span>
</pre>
</div>
<p>Notice that code is very similar to the &#8220;def fx()&#8221; code.  The key differences are that we had to define genericFx as extending a trait (a type of Scala interface) called VectorFN and inside this trait extension we defined a parameterized function name apply().  apply() is a generic function that is willing to work over any type Y where Y is at least of type NumberBase[Y] (we will get more into what that means in a moment).  The difference in notation is that while the Scala function <em>syntax</em> can not specify a generic function with free type parameters (the incompletely specified Y) the Scala <em>semantics</em> are strong enough to implement this.  In fact standard function definitions (such as &#8220;def fx()&#8221;) are just syntactic sugar for extending the Scala built-in <a href="http://www.scala-lang.org/docu/files/api/scala/Function1.html" target="ext">Function1 trait</a>.  With a generic objective function in hand all we need is conjugate gradient code that is expecting a VectorFN (and willing to call apply() instead of just using naked function parenthesis) and some type NumberBase[Y] that can compute gradients for us.  The Scala compiler can specialize our genericFx() into one version for quick calculation and another for gradients.  How this is done is what we will discuss next.  From our point of view our problem is solved with the following one line of code:</p>
<div class="highlight">
<pre><span class="k">val</span> <span class="o">(</span><span class="n">pF</span><span class="o">,</span><span class="n">fpF</span><span class="o">)</span> <span class="k">=</span> <span class="nc">CG</span><span class="o">.</span><span class="n">minimize</span><span class="o">(</span><span class="n">genericFx</span><span class="o">,</span><span class="n">p0</span><span class="o">)</span>
</pre>
</div>
<p>This should always be your goal- build sufficient preparation so your last step is a &#8220;obvious one liner.&#8221;</p>
<h2>What Tools we Wish we Had Lying Around</h2>
<p>We supply in our example some workable conjugate gradient code, but that is standard so we will not discuss it.  What is of interest (and facilitated by Scala&#8217;s parametrized type system) is the implementation of <a href="http://en.wikipedia.org/wiki/Dual_number" target="ext">dual numbers</a> as a framework to supply automatic differentiation.  An implementation of dual numbers as a NumerBase[DualNumber] type is the core of our demonstration.</p>
<p>Dual numbers are an algebraic structure written as pairs of real numbers &#8220;(a,b)&#8221;.  The arithmetic table for dual numbers is given below:</p>
<table>
<tr>
<td>(a,b) + (c,d)</td>
<td>=</td>
<td>((a+c) , (b+d))</td>
</tr>
<tr>
<td>(a,b) &#8211; (c,d)</td>
<td>=</td>
<td>((a-c) , (b-d))</td>
</tr>
<tr>
<td>(a,b) * (c,d)</td>
<td>=</td>
<td>((a*c) , (a*d+b*c))</td>
</tr>
<tr>
<td>(a,b) / (c,d)</td>
<td>=</td>
<td>((a/c) , ((b*c-a*d)/(a*a)))</td>
</tr>
</table>
<p>In a dual number (a,b) &#8220;a&#8221; is the &#8220;large&#8221; or &#8220;standard&#8221; part of the number.  You can check from the arithmetic table that the pair of dual numbers (a,0) and (c,0) behave just as we would expect the real numbers a and c to behave.  In the dual number (a,b) &#8220;b&#8221; is the &#8220;small&#8221; or &#8220;ideal&#8221; portion of the number.  From the multiplication rule above  we can observe two rules: (0,b) * (c,0) = (0,b*c) (something small times anything else is small) and (0,b)*(0,d) = (0,0) (two small things become zero when multiplied).  Essentially the dual numbers are carrying around the first two terms of a Taylor series: we get as a result both the function value and the function derivative.  For a function f() over the real numbers we extend f() to work over the dual number by defining: f((a,b)) = (f(a),b f&#8217;(a)) (which is consistent with the previously defined arithmetic). We can check that the dual numbers numbers obey the usual laws of arithmetic (associative, commutative, distributive, identities and inverses).  The punchline is that over the dual numbers the divided difference estimate of f&#8217;(x) (the derivative of f() evaluated at x)  is in fact exact in the sense that f((x,1)) = (f(x),f&#8217;(x)) (or f((x,0)+(0,1)) &#8211; f((x,0)) = (0, f&#8217;(x))).  Implementing the DualNumber class is little more than transcribing the above arithmetic table into Scala.</p>
<p>We have already seen how to write code that uses NumberBase[Y] types (genericFx() itself is an example).  A more complicated example is the CG.minimize() code which not only accepts a generic function (in the form of VectorFN) but then specializes it to NumberBase[DualNumber] to compute gradients and also specializes to NumberBase[MDouble] for quick calculation during line searches (MDouble is just an adapter for machine Doubles, used for speed).  The ability to re-specialize a function is one of the advantages of a parameterized type system.  The DualNumbers are an example of forward automatic differentiation.  We could also use the same object framework to capture a representation of the computation path and apply more sophisticated methods such as reverse automatic differentiation. </p>
<p>We give a link to a jar containing <a href="http://www.win-vector.com/dfiles/ScalaDiff.jar">complete Scala source code</a> including this example, the DualNumber implementation, a conjugate gradient implementation and some JUnit tests (all under a GPLv3 license) and will go on to describe some of the design decisions.  The code is the bulky part of this work, so we will move on to discuss something more compact: types.</p>
<h2>Types</h2>
<p>If code is ever beautiful it is only when it is succinct.  Among the most succinct forms of code are individual type signatures and interfaces (though the indiscriminate repetition of type signatures is rightly considered ugly bloat, which Scala works to avoid).   Since we are distributing complete source we will describe only types and method signatures.  The entry points to the code are the JUnit tests (organized in the ScalaDiff/test source directory and depending on JUnit which was not included) and the demo program in ScalaDiff/src/demo/Demo.scala).</p>
<p>To be a usable arithmetic type (like DualNumber or MDouble) you must extend the following parameterized abstract class:</p>
<div class="highlight">
<pre><span class="k">abstract</span> <span class="k">class</span> <span class="nc">NumberBase</span><span class="o">[</span><span class="kt">NUMBERTYPE</span> <span class="k">&lt;:</span> <span class="kt">NumberBase</span><span class="o">[</span><span class="kt">NUMBERTYPE</span><span class="o">]]</span> <span class="o">{</span>
  <span class="c">// basic arithmetic</span>
  <span class="k">def</span> <span class="o">+</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="o">-</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="n">unary_-</span><span class="o">()</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="o">*</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="o">/</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>  <span class="kt">//</span> <span class="kt">that</span> <span class="kt">not</span> <span class="kt">equal</span> <span class="kt">to</span> <span class="kt">zero</span>
  <span class="c">// more complicated</span>
  <span class="k">def</span> <span class="n">pow</span><span class="o">(</span><span class="n">that</span><span class="k">:</span><span class="kt">Double</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="n">exp</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>
  <span class="k">def</span> <span class="n">log</span><span class="k">:</span><span class="kt">NUMBERTYPE</span> <span class="kt">//</span> <span class="kt">this</span> <span class="kt">is</span> <span class="kt">positive</span>
  <span class="c">// comparison functions</span>
  <span class="k">def</span> <span class="o">&gt;</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="k">def</span> <span class="o">&gt;=</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="k">def</span> <span class="o">==</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="k">def</span> <span class="o">!=</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="k">def</span> <span class="o">&lt;</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="k">def</span> <span class="o">&lt;=</span> <span class="o">(</span><span class="n">that</span><span class="k">:</span> <span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Boolean</span>
  <span class="c">// utility</span>
  <span class="k">def</span> <span class="n">field</span><span class="k">:</span><span class="kt">Field</span><span class="o">[</span><span class="kt">NUMBERTYPE</span><span class="o">]</span>
<span class="o">}</span>
</pre>
</div>
<p>In particular DualNumber extends NumberBase[DualNumber].  This deliberate circular reference has a big purpose: it allows publicly visible contravariant return types (returning nearly the exact type we really are instead of a base type).  This allows us to have strict type arguments so that trying to add a MDouble to DualNumber is a type error (even though they both extend the same base class).  The automatic differentiation technique encapsulated in the DualNumber class only works if all of the calculation is in the DualNumber types and this strict type enforcement allows the compiler to help prevent results sneaking in and out through other types.  All of the methods on NumberBase are obviously related to arithmetic except the field() method.  This method gives us access to a Field object which is responsible for carrying around the runtime type information (this is a common problem in Java and Scala, that some type information known at compile type such choice of template types is not easily accessed at runtime).  The Field class is as follows:</p>
<div class="highlight">
<pre><span class="k">abstract</span> <span class="k">class</span> <span class="nc">Field</span> <span class="o">[</span><span class="kt">NUMBERTYPE</span> <span class="k">&lt;:</span> <span class="kt">NumberBase</span><span class="o">[</span><span class="kt">NUMBERTYPE</span><span class="o">]]</span> <span class="o">{</span>
  <span class="k">def</span> <span class="n">zero</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>            <span class="kt">//</span> <span class="kt">return</span> <span class="kt">canonical</span> <span class="kt">zero</span> <span class="kt">in</span> <span class="kt">field</span>
  <span class="k">def</span> <span class="n">one</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>             <span class="kt">//</span> <span class="kt">return</span> <span class="kt">canonical</span> <span class="kt">one</span> <span class="kt">in</span> <span class="kt">field</span>
  <span class="k">def</span> <span class="n">inject</span><span class="o">(</span><span class="n">v</span><span class="k">:</span><span class="kt">Double</span><span class="o">)</span><span class="k">:</span><span class="kt">NUMBERTYPE</span>  <span class="kt">//</span> <span class="kt">return</span> <span class="kt">canonical</span> <span class="kt">representation</span> <span class="kt">of</span> <span class="kt">number</span> <span class="kt">in</span> <span class="kt">field</span>
  <span class="k">def</span> <span class="n">project</span><span class="o">(</span><span class="n">v</span><span class="k">:</span><span class="kt">NUMBERTYPE</span><span class="o">)</span><span class="k">:</span><span class="kt">Double</span> <span class="kt">//</span> <span class="kt">return</span> <span class="kt">standard-number</span> <span class="kt">represented</span> <span class="kt">in</span> <span class="kt">field</span>
  <span class="k">def</span> <span class="n">array</span><span class="o">(</span><span class="n">n</span><span class="k">:</span><span class="kt">Int</span><span class="o">)</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">NUMBERTYPE</span><span class="o">]</span> <span class="kt">//</span> <span class="kt">return</span> <span class="kt">an</span> <span class="kt">array</span> <span class="kt">of</span> <span class="kt">this</span> <span class="k">type</span>
</pre>
</div>
<p>The Field class is where we have factories for numbers (zero, one, arrays, injection from standard Doubles), casting (projection back to standard Doubles).</p>
<p>With these types defined we can actually read intent off some of the method signatures.  </p>
<p>For example our conjugate gradient optimizer is accessed through the following method signature:</p>
<div class="highlight">
<pre> <span class="k">def</span> <span class="n">minimize</span><span class="o">(</span><span class="n">fn</span><span class="k">:</span><span class="kt">VectorFN</span><span class="o">,</span><span class="n">x0</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">])</span><span class="k">:</span><span class="o">(</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">],</span><span class="kt">Double</span><span class="o">)</span> <span class="c">// return x,f(x)</span>
</pre>
</div>
<p>The above can be read as: CG.minimize() requires a VectorFN (our trait representing single argument functions with a free type parameter) and an initial point (in standard Doubles).  The code will the return a pair of the optimum point and the function evaluated at the optimum point.  From the type signature we can see that CG.minimize() expects to re-specialize the function &#8220;fn&#8221; to types of its own choosing (else it could have accepted a parameterized argument instead of our custom trait) and will handle all up-conversion and down-conversion between machine Doubles and NumberBase[Y]&#8216;s itself.  This sort of type information is hard to express (let alone enforce) in a dynamically typed language.</p>
<p>A slightly more complicated example is the lineMinD() method:</p>
<div class="highlight">
<pre><span class="k">def</span> <span class="n">lineMinD</span><span class="o">[</span><span class="kt">Y&lt;:NumberBase</span><span class="o">[</span><span class="kt">Y</span><span class="o">]](</span><span class="n">field</span><span class="k">:</span><span class="kt">Field</span><span class="o">[</span><span class="kt">Y</span><span class="o">],
 </span><span class="n">f</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Y</span><span class="o">]</span><span class="k">=&gt;</span><span class="kt">Y</span><span class="o">,
 </span><span class="n">xm</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">],
 </span><span class="n">di</span><span class="k">:</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">])</span><span class="k">:</span><span class="o">(</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">],</span><span class="kt">Double</span><span class="o">)</span>
</pre>
</div>
<p>Notice it is willing to work with any type parameterized function (which means it is willing to let the caller pick the actual type of NumberBase[Y] and work with that).  Most callers will call with Y=MDouble (the wrapper for machine Doubles) and lineMin() will then work with that (without ever really knowing the actual underlying type).</p>
<p>A lot of fans of dynamic languages consider type systems to be mere hairshirt penance.   But that is not so.  Broken type systems (like Java&#8217;s collections before  erasure parameters were introduced in Java 1.5) are indeed more trouble than they are worth.  Working type systems (like C++ Templates/STL, Java 1.5+ and Scala) allow you to solve problems (and enforce decisions) during the design phase (which is much much cheaper than during the deployment phase).  You can&#8217;t set your types in stone (you are likely going to have them subtly wrong for the first few iteration).  You must be willing to think like a &#8220;language lawyer&#8221; to find out what parts of your work can be specified and enforced in the language type system.  To use an analogy: static types are your blueprint or your underpainting.</p>
<h2>Tests</h2>
<p>One argument against static types is that you can get much of their benefit from unit tests.  My opinion is you never have enough unit tests, so putting more pressure on your test suite is not wise.   Static types plus tests are strictly more powerful than static types alone or tests alone. </p>
<p>Even for this example toy-scale project we have include a JUnit test set to pursue a number of goals:</p>
<ul>
<li>Confirm our number implementations (DualNumber and MDouble) correctly model machine Doubles (perform parallel calculations and compare).</li>
<li>Confirm DualNumber obeys expected laws of algebra composition and cancellation <em>including the portions that can not be modeled in machine Doubles</em>.</li>
<li>Confirm DualNumbers compute gradients.</li>
<li>Confirm operations of optimizers and optimizer components.</li>
</ul>
<p>Many of these tests are related, but they don&#8217;t all imply each other and give different perspective on the errors they catch.  For example no amount of parallel computation between DualNumbers and machine Doubles is going to confirm the infinitesimal portion of the DualNumber is propagating correctly (since this is not a property of machine Doubles).  So we add extra tests that expect DualNumber to obey algebraic relations like: a*(b+c) = a*b + a*c hold.  It is then another step to confirm that whatever the DualNumbers calculate is not only self-consistent, but also models a truncated Taylor Series or differentiation.</p>
<h2>Conclusion</h2>
<p>We hope we have demonstrated how the complexity of a mathematical programming problem can be managed by breaking the problem into an objective function that is separate from the optimizer (allowing the optimizer to be both good and hidden) and a static type system (such as Scala) to help enforce required properties of a calculation (such as all numbers being routed though a required representation).  With these sort of tools available many formerly hard problems (that are often, unfortunately solved by over-specifying direct inefficient iterative improvement techniques) become &#8220;if I can write a reasonable objective function this may already by solved by an optimizer in my library.&#8221;  The more of these tools you have (either in your code or in your reference library) the more of these problems become easy (this is the topic of my earlier paper: <a href="http://www.win-vector.com/blog/2009/11/the-local-to-global-principle/">The Local to Global Principle</a>).</p>
<h2>Appendix: Fixing Smoothness</h2>
<p>Our chosen example objective function is very nice (i.e. convex) but it has a small (but correctable) problem.   The derivative or gradient or gradient has some jump discontinuities that could cause an optimizer to exit prematurely (not at the global optimum).  Consider the simple form of this for wiring a center to a single point at the origin (even in 1 dimension).  The wiring cost function is sqrt(x*x) has a cost graph as shown here.</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/abs.png" alt="abs.png" border="0" width="525" height="525" /><br />
</center></p>
<p>This is convex- but derivative is not smooth as we see in the included graph of the derivative of sqrt(x*x).</p>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/dabs.png" alt="dabs.png" border="0" width="525" height="525" /><br />
</center></p>
<p>So: in this case if the optimizer stops at one of the target points we can&#8217;t be sure that it stopped at the global optimum (it may have stopped due to the discontinuity in the gradient).  For some simple problems the optimum is necessarily at a target point.  For example on the number line take the target points 0,1 and x.  As long as x&ge;0 and x&le;1 the optimum placement will be x itself.</p>
<p>One way to defend against this is to use some sort of smoothed version of sqrt() that essentially decreases a little faster near the origin.  Our cost function becomes:<br />
<center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2010/06/cost2.png" alt="cost2.png" border="0" width="237" height="55" /><br />
</center><br />
where s() is our suitable approximation of the sqrt() function.  Two candidates are s(x) = (x+tau)^(1/2) and s(x) = x^(1/2 + tau); where tau is a small constant.  As long as tau is greater than zero we have no derivative discontinuity in s(x^2) and convexity is preserved (even made a bit stricter).  Other ways to deal with this include adding additional coordinates to the problem and small perturbations on these coordinates.  Finally, a point found by optimizing with respect to s(x) can be &#8220;polished&#8221; by re-starting the optimization at the first found solution and using sqrt(x) as the new objective (if the original point is not near any of the target points).</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2010/07/gradients-via-reverse-accumulation/' rel='bookmark' title='Permanent Link: Gradients via Reverse Accumulation'>Gradients via Reverse Accumulation</a></li>
<li><a href='http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/' rel='bookmark' title='Permanent Link: R examine objects tutorial'>R examine objects tutorial</a></li>
<li><a href='http://www.win-vector.com/blog/2009/09/survive-r/' rel='bookmark' title='Permanent Link: Survive R'>Survive R</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>R annoyances</title>
		<link>http://www.win-vector.com/blog/2010/03/r-annoyances/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=r-annoyances</link>
		<comments>http://www.win-vector.com/blog/2010/03/r-annoyances/#comments</comments>
		<pubDate>Sat, 20 Mar 2010 18:49:42 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Rants]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Principle of Least Astonishment]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[R is not your friend]]></category>
		<category><![CDATA[R programming annoyances]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1407</guid>
		<description><![CDATA[Readers returning to our blog will know that Win-Vector LLC is fairly &#8220;pro-R.&#8221; You can take that to mean &#8220;in favor or R&#8221; or &#8220;professionally using R&#8221; (both statements are true). Some days we really don&#8217;t feel that way. Consider the following snippet of R code where we create a list with a single element [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/' rel='bookmark' title='Permanent Link: R examine objects tutorial'>R examine objects tutorial</a></li>
<li><a href='http://www.win-vector.com/blog/2010/01/relative-returns-a-banker-versus-trader-paradox/' rel='bookmark' title='Permanent Link: Relative returns: a banker versus trader paradox'>Relative returns: a banker versus trader paradox</a></li>
<li><a href='http://www.win-vector.com/blog/2008/04/sorting-in-anger/' rel='bookmark' title='Permanent Link: Sorting Used in Anger'>Sorting Used in Anger</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Readers returning to our blog will know that Win-Vector LLC is fairly &#8220;pro-<a href="http://www.win-vector.com/blog/tag/r/" target="x">R</a>.&#8221;  You can take that to mean &#8220;in favor or R&#8221; or &#8220;professionally using R&#8221; (both statements are true).  Some days we really don&#8217;t feel that way.  <span id="more-1407"></span><br />
Consider the following snippet of R code where we create a list with a single element named &#8220;x&#8221; that refers to a numeric vector.  We start with a demonstration of the hard-coded method of pulling the x-value back out using the &#8220;$&#8221; operator.</p>
<pre>
&gt; l &lt;- list(x=c(1,2,3))
&gt; l$x
[1] 1 2 3
</pre>
<p>But suppose we wanted to automate this; that is pass in the name of the value we want in a variable.  We are after all using a computer, so automating a step seems like a reasonable desire.  R supplies a notation for this using the &#8220;[]&#8221; operator.  But something slightly different comes out under the &#8220;[]&#8221; operator than under the &#8220;$&#8221; operator:</p>
<pre>
&gt; varName <- 'x'
&gt; l[varName]
$x
[1] 1 2 3
</pre>
<p>Notice that the printed outputs are slightly different (one echoes "$x" and one does not).  Let's use the "class()" method to see what is actually being returned in each case.</p>
<pre>
&gt; class(l$x)
[1] "numeric"
&gt; class(l['x'])
[1] "list"
</pre>
<p>Completely different return types are returned (in one case a numeric vector in the other a general list, not interchangeable types). </p>
<p>At this point you may think it is time to turn in our "pro" label and call ourselves "newb" (Internet slang for "newbie" or "idiot").  But let's slow down for a bit.   When two views of the same situation disagree (such as the difference in opinion between the authors of R and myself whether the "[]" and "$" operators should return the same type) you at most know that at least one of those views is wrong.  You don't really know if one view is right or even if one view is right which one it is.  I can, however, bring in some additional argument to try and show the design of R is in fact wrong.  The additional argument is <a href="http://en.wikipedia.org/wiki/Principle_of_least_astonishment" target="o">"The Principle of Least Astonishment."</a>  This principle roughly says that it is a mistake to introduce unnecessary differences in outcomes (which to the unprepared user are unpleasant surprises).  There may be some deep (yet obscure) reasons the two operators prefer to return different results.  But the fact you would have to find a way to document and explain these differences really should make one think that this situation is really a mis-design and the "explanation" is really an attempt at a work around.  Or to put it more rudely: there may be an explanation, but there is no excuse.</p>
<p>For another example consider creating a 3 by 3 matrix:</p>
<pre>
&gt; m &lt;- matrix(c(1,2,3,1,1,1,0,0,1),nrow=3,ncol=3)
&gt; m
     [,1] [,2] [,3]
[1,]    1    1    0
[2,]    2    1    0
[3,]    3    1    1
</pre>
<p>Now select the last two rows of the matrix.</p>
<pre>
&gt; m[c(FALSE,TRUE,TRUE),]
     [,1] [,2] [,3]
[1,]    2    1    0
[2,]    3    1    1
&gt;
</pre>
<p>Now (for the punchline) try to select just the middle row of the matrix.<br />
 </p>
<pre>
&gt; m[c(FALSE,TRUE,FALSE),]
[1] 2 1 0
</pre>
<p>Notice that once again (and without warning) the result is subtly different.  I admit that it seems paranoid to worry about such small differences- but when you are debugging a system that should work these are exactly the killing mistakes you are looking for.  In this case the problem is pretty bad.  See what happens if you tried to ask for the dimension of each of these differing returns:</p>
<pre>
&gt; dim(m[c(FALSE,TRUE,TRUE),])
[1] 2 3
&gt; dim(m[c(FALSE,TRUE,FALSE),])
NULL
</pre>
<p>The first case works fine (reports 2 rows and 3 columns).  The second case returns "NULL" (instead of 1 row and 3 columns).   In R NULL is sometimes used as an error-value (instead of throwing an exception) and this value will poison any further conditions or calculations it is involved in.  The main way to deal with the arbitrary introduction of such NULLs is the incredibly tedious uncertain defensive coding practices that we argue against in <a href="http://www.win-vector.com/blog/2010/02/postels-law-not-sure-who-to-be-angry-with/">Postel’s Law: Not Sure Who To Be Angry With</a>.  Such code weakens both programs and programmers.</p>
<p>But what is going on in this example?  Once again we use the "class()" method to inspect the subtly different results.</p>
<pre>
&gt; class(m[c(FALSE,TRUE,TRUE),])
[1] "matrix"
&gt; class(m[c(FALSE,TRUE,FALSE),])
[1] "numeric"
</pre>
<p>The result is disappointing.  For a two-row select R returns a matrix (what we would expect).  For a single-row select R does us the "favor" of converting the result into a vector.  This is a disaster.  A single row matrix is similar to a vector, but even R itself does not support the same set of operations and outcomes on vectors as it does on matrices (for example the failure of the "dim()" method).  It is not safe to further calculate with these results (without by-hand converting the result back to a single row matrix which R can in fact represent).  In my case this created crashing bugs deep in a long running analysis (and was hard to diagnose as the bug was in an "innocent operation" not in a "risky calculation").</p>
<p>All of this has to violate John Chambers' "Prime Directive" for data: "an obligation on all creators of software to program in such a way that the computations can be understood and trusted."  Chambers' opinion being relevant as he is the author of the S language (of which R is an open source re-implementation).  We continue to recommend R, but we also recommend being exceptionally careful when using it (which unfortunately adds time to projects).</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/' rel='bookmark' title='Permanent Link: R examine objects tutorial'>R examine objects tutorial</a></li>
<li><a href='http://www.win-vector.com/blog/2010/01/relative-returns-a-banker-versus-trader-paradox/' rel='bookmark' title='Permanent Link: Relative returns: a banker versus trader paradox'>Relative returns: a banker versus trader paradox</a></li>
<li><a href='http://www.win-vector.com/blog/2008/04/sorting-in-anger/' rel='bookmark' title='Permanent Link: Sorting Used in Anger'>Sorting Used in Anger</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/03/r-annoyances/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Postel&#8217;s Law: Not Sure Who To Be Angry With</title>
		<link>http://www.win-vector.com/blog/2010/02/postels-law-not-sure-who-to-be-angry-with/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=postels-law-not-sure-who-to-be-angry-with</link>
		<comments>http://www.win-vector.com/blog/2010/02/postels-law-not-sure-who-to-be-angry-with/#comments</comments>
		<pubDate>Fri, 26 Feb 2010 00:38:55 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Rants]]></category>
		<category><![CDATA[Postel's Law]]></category>
		<category><![CDATA[Unit Testing]]></category>
		<category><![CDATA[Worse is Better]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1394</guid>
		<description><![CDATA[One of my research interests is finding the principles that underly the management of information, complexity and uncertainty. When something as simple as a web-form is called &#8220;technology&#8221; it is time to step back and examine your principles. One principle I am not sure about Postel&#8217;s law. It doesn&#8217;t hold often enough to be relied [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/01/map-reduce-a-good-idea/' rel='bookmark' title='Permanent Link: Map Reduce: A Good Idea'>Map Reduce: A Good Idea</a></li>
<li><a href='http://www.win-vector.com/blog/2010/03/r-annoyances/' rel='bookmark' title='Permanent Link: R annoyances'>R annoyances</a></li>
<li><a href='http://www.win-vector.com/blog/2008/10/something-i-dont-get-about-business-and-bailouts/' rel='bookmark' title='Permanent Link: Something I don&#8217;t get about business and bailouts'>Something I don&#8217;t get about business and bailouts</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>One of my research interests is finding the principles that underly the management of information, complexity and uncertainty.  When something as simple as a web-form is called &#8220;technology&#8221; it is time to step back and examine your principles.  One principle I am not sure about Postel&#8217;s law.  It doesn&#8217;t hold often enough to be relied on and when it fails I am not sure who to be angry with.<span id="more-1394"></span></p>
<p>Postel&#8217;s Law (also called The Robustness Principle) comes from RFC 761 &#8220;Transmission Control Protocol&#8221; in 1980 and is: &#8220;Be conservative in what you do; be liberal in what you accept from others.&#8221; (Side note: RFC is the now ironic acronym used to describe Internet standards- the letters stood for &#8220;request for comments&#8221;).</p>
<p>This idea probably worked best where it started- in the TCP/IP world which is where a lot of hairy details of how computers network are handled.  When your goal is the basic establishment of transient communications- success is measured by getting information through and not unnecessarily triggering a failure.  It may be okay to tolerate mistakes here- because they don&#8217;t live long anyway.</p>
<p>Unfortunately, the law works less well other places where it is applied.  The law is a downright hazard in dealing with archiving meaningful data (instead of managing transient signaling protocols). Sometimes the cost of obeying the law far outweighs the potential benefit.</p>
<p>A common arena for Postel&#8217;s law is now HTML, the markup language used to represent the content we view in web-browsers.  In this arena Postel&#8217;s law has had two consequences: one good, one bad.</p>
<p>The good: almost anyone can create a working web-page or even a web-site because modern browsers have been designed to paper around almost every common HTML mistake.  It has been pointed out that this ease of creation and &#8220;worse is better&#8221; (a deep principal due to Richard P. Gabriel see <a target='ext' href="http://en.wikipedia.org/wiki/Worse_is_better">wikipedia: worse is better</a>) has been one of the reasons that HTML out-competed and killed many other ideas.  Philip Greenspun&#8217;s famous <a target='ext' href="http://philip.greenspun.com/panda/html">story</a> of a 10-year-old building web site to get his mother medical attention happened in the sloppy world of HTML and could not have happened in the straight jacket of RDF (Resource Description Framework: the darling of the semantic web).  I would not wish having to actually read or adhere to the incredibly long and irrelevant standards from w3.org (where the ratio of value to pedantry goes to zero) on an enemy.  The web is only interesting due to its content and much of its content was only possible due to low barrier of entry.</p>
<p>The bad: to read HTML you almost have to re-create the entire history of web-browsers.  This is a history of many hostile competitors (Microsoft, Netscape, Opera, WebKit, Mozzila, Google) and billions of dollars.  Reproducing a significant fraction of this history is a significant (and useless) expense.  For the most part I use a permissive parsing library like TagSoup or HTMLTidy but even these miss some things that browsers accept and are far more complicated than the task truly justifies.</p>
<p>Even worse is the cases of XML and RDF.  These are often used for archival storage of semantic data.  That is you may need to read and understand (not just display) data in XML for a long time.  To be liberal in what you accept you have to again master a long set of useless complications (DTDs, namespaces incredibly inept character encoding and escapes) and still get burned by improperly encoded XML (that &#8220;used to work&#8221; because the bugs in the emitted XML matched the bugs in a library that is now out of date).</p>
<p>It is clear in the case of HTML and XML that Postel&#8217;s law&#8217;s cost is too high for what it delivers.  Or at least half of the law is too expensive: no amount of being generous in what we accept makes up for the original data not have been impounded correctly (not being &#8220;conservative in what they do&#8221; and not having checked that at the time it was created).  Some of this is that the producers of the data have no way of telling they are not being &#8220;conservative in what they do&#8221; because the &#8220;generous in what they accept&#8221; libraries they use to debug don&#8217;t tell them they are emitting bad data.  And lets be honest- most systems are not designed for correctness, they are instead debugged until they seem to work.  I would say that in fact HTML is not an example of the power of Postel&#8217;s law but of the pernicious influence of &#8220;worse is better.&#8221; Computer science has not risen to the level of &#8220;software engineering&#8221; we still are a horrible &#8220;fit to finish&#8221; industry.</p>
<p>Frankly for many things we need a simpler &#8220;fail early&#8221; discipline.  Tools need to be better and standards need to be simpler so that if you write something that is wrong it is easy to see why it is wrong and easy to fix it.  Postel&#8217;s law has helped hide the negative impacts of complicated standards, we need to push the cost of complications back on to standards committees.  The need to be &#8220;generous in what you accept&#8221; overly favors large, rich entrenched players who have had the time and resources in incrementally invest in papering around every common mistake.</p>
<p>However, I am not sure if we can throw out half of Postel&#8217;s law or even if we want to.  When Postel&#8217;s law fails it is not clear who to be mad at.</p>
<p>Sun, to kick somebody who is already down, was famous for making elaborate frameworks that correctly and brutally implement many details of RFCs.  Sun&#8217;s Java includes huge frameworks for XML, UTF8 and email that scrupulously implement page after page of useless standard documentation but fail in the wild due to not being &#8220;generous in what they accept.&#8221;  For example Sun&#8217;s GlassFish (which got listed named as one of four or five important assets during Sun&#8217;s various acquisition talks much like the fact the car has cup-holder somehow always gets mentioned in spec sheets) is an &#8220;open source production-quality enterprise software application server.&#8221;  A supposedly major component of the GlassFish is its email component which is a huge unwieldy framework that implements many of the email related RFCs and protocols including IMAP.  Unfortunately for all its hugeness it can not reliably read email folder names from one of the biggest IMAP servers: Google Mail.  Google Mail includes &#8220;against standard&#8221; characters in the protocol and crashes the GlassFish software.</p>
<p>And here is where Postel&#8217;s law fails us: under Postel&#8217;s law both sides are at fault (Sun for failing to be generous in what they accepted, Google for failing to be conservative in what they did).  We can&#8217;t assign only one villain.  We have no proscription of who to ask for a fix.   Postel&#8217;s law seems useful in that if either Google or Sun had followed it the two systems would work.  But the law doesn&#8217;t pick one side to assign blame and help us to efficiently diagnose and fix the problem.  It becomes difficult to find the critical bugs when they are masked by a see of &#8220;acceptable&#8221; bugs.  Take a contrary example: the simpler law &#8220;implement the standard or fix the standard&#8221; would clearly assign blame to GMail.</p>
<p>Similar pain is encountered in Java&#8217;s handling of character encodings like UTF8.  It is hard to move up the stack of artificial intelligence (from words, to concepts, to ideas, to reasoning to consciousness) when you can&#8217;t even reliably transcribe characters.  When faced with bad character sequences (a common occurrence on the web) there is no practical way to get Java &#8220;mostly parse it,&#8221; Java libraries and frameworks authors seem to extract a perverse joy in throwing a program-killing exception (it does not matter if you catch it the library has already stopped doing what you wanted) because they are concerned that a diacritical mark was not properly encoded (web browsers, on the other hand, lose the mark or show some sort of damage near the mistake and blunder on).  And here is were the frustration sets in, how can you make applications that are generous in what they accept when the libraries and frameworks are overly proud and picky?  This, at first, seems like an argument for Postel&#8217;s law- if everybody else (especially the library authors) were generous in what they accepted your life could be easy.  That is certainly one possibility- but I argue it often becomes a matter of semantics to assign blame where there is no pre-existing specification or performance agreement.  In the end you will waste more time dealing with errors that should never have made it to you than the time you save emitting the odd error of your own.</p>
<p>The unit testing people have a somewhat better idea: fail early, fail at the factory where it is cheap to fix.  Don&#8217;t   litter all of your code with indecisive statements like:</p>
<pre>
  Set<String> matches = computeMatches();
  if( matches!=null ) {
     for(String match: matches) {
         ...
     }
  }
</pre>
<p>Instead: write a unit test to document you expectation that the empty set is expressed in single consistent way:</p>
<pre>
   Set<String> matches = computeMatches();
   assertNotNull(matches);
</pre>
<p>And from then on write more confident code:</p>
<pre>
  for(String match: computeMatches()) {
         ...
  }
</pre>
<p>This may seem overly optimistic and overly strict- but I have a point.  One of the few good principles in computer science (and perhaps one of computer science&#8217;s contributions to knowledge, computers are a huge contribution to society- but they were made by engineers) is composition.  A plan for getting from A to B followed by (or composed with) a plan for getting from B to C is a plan for getting from A to C.  Well a correct plan for getting from A to B when composed with a correct plan for getting from B to C, if each of the plans &#8220;is mostly right if the piece after is so nice to fix up a few mistakes&#8221; you really don&#8217;t know what you have.  You may have nothing.</p>
<p>That is my complaint- you can&#8217;t put an a priori bound on how expensive attempting to allow both sides of Postel&#8217;s law will be.  You would like others to paper over your mistakes, but it is becoming too expensive to paper over the mistakes of others.  In the end Postel&#8217;s law is of little help when cleaning up the inevitable mess.</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/01/map-reduce-a-good-idea/' rel='bookmark' title='Permanent Link: Map Reduce: A Good Idea'>Map Reduce: A Good Idea</a></li>
<li><a href='http://www.win-vector.com/blog/2010/03/r-annoyances/' rel='bookmark' title='Permanent Link: R annoyances'>R annoyances</a></li>
<li><a href='http://www.win-vector.com/blog/2008/10/something-i-dont-get-about-business-and-bailouts/' rel='bookmark' title='Permanent Link: Something I don&#8217;t get about business and bailouts'>Something I don&#8217;t get about business and bailouts</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2010/02/postels-law-not-sure-who-to-be-angry-with/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R examine objects tutorial</title>
		<link>http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=r-examine-objects-tutorial</link>
		<comments>http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/#comments</comments>
		<pubDate>Sat, 21 Nov 2009 15:39:21 +0000</pubDate>
		<dc:creator>John Mount</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Tutorial]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1134</guid>
		<description><![CDATA[This article is quick concrete example of how to use the techniques from Survive R to lower the steepness of The R Project for Statistical Computing&#8216;s learning curve (so an apology to all readers who are not interested in R). What follows is for people who already use R and want to achieve more control [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/09/survive-r/' rel='bookmark' title='Permanent Link: Survive R'>Survive R</a></li>
<li><a href='http://www.win-vector.com/blog/2009/12/cru-graph-yet-again-with-r/' rel='bookmark' title='Permanent Link: CRU graph yet again (with R)'>CRU graph yet again (with R)</a></li>
<li><a href='http://www.win-vector.com/blog/2010/03/r-annoyances/' rel='bookmark' title='Permanent Link: R annoyances'>R annoyances</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>This article is quick concrete example of how to use the techniques from  <a href="http://www.win-vector.com/blog/2009/09/survive-r/">Survive R</a> to lower the steepness of <a href="http://www.r-project.org/">The R Project for Statistical Computing</a>&#8216;s learning curve (so an apology to all readers who are not interested in R).  What follows is for people who already use R and want to achieve more control of the software.<span id="more-1134"></span><br />
I am a fan of the <a href="http://www.r-project.org/">R</a>.  The R software does a number of incredible things and is the result of a number of good design choices.  However, you can&#8217;t fully benefit from R if you are not already familiar the internal workings of R.  You can quickly become familiar with the internal workings of R if you learn how to inspect the objects of R (as an addition to using the built in help system).  Here I give a concrete example of how to use the R system itself to find answers, with or without the help system.  R documentation has the difficult dual responsibility of attempting to explain both how to use the R software and explain the nature of the underlying statistics; so the documentation is not always the quickest thing to browse.</p>
<p>First let&#8217;s give R the commands to build a fake data set that has a variable y that turns out to be 3 times x (another variable) plus some noise:</p>
<pre>
&gt; n &lt;- 100
&gt; x &lt;- rnorm(n)
&gt; y &lt;- 3*x + 0.2*rnorm(n)
&gt; d &lt;- data.frame(x,y)
</pre>
<p>This data set (by design) has a nearly a linear relation between x and y.  We can plot<br />
the data as follows:</p>
<pre>
&gt; library(ggplot2)
&gt; ggplot(data=d) + geom_point(aes(x=x,y=y))
</pre>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/dat1.png" alt="dat1.png" border="0" width="500" height="500" /><br />
</center></p>
<p>With data like this the most obvious statistical analysis is a <a href="http://en.wikipedia.org/wiki/Linear_regression">linear regression</a>.  R can very quickly perform the linear regression and report the results.</p>
<pre>
&gt; model &lt;- lm(y~x,data=d)
&gt; summary(model)

Call:
lm(formula = y ~ x, data = d)

Residuals:
     Min       1Q   Median       3Q      Max
-0.41071 -0.12762 -0.00651  0.10240  0.62772 

Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)
(Intercept) -0.02609    0.02102  -1.241    0.217
x            2.99150    0.02202 135.858   &lt;2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.2102 on 98 degrees of freedom
Multiple R-squared: 0.9947,	Adjusted R-squared: 0.9947
F-statistic: 1.846e+04 on 1 and 98 DF,  p-value: &lt; 2.2e-16
</pre>
<p>We can read the report and see that the estimated fit formula is: y =  2.99150*x &#8211; 0.02609 (which is very close to the true formula y = 3*x) .  At this point the analysis is done (if the goal of the analysis is to just print the results).  However, if we want to use the results in a calculation we need to get at the numbers shown in above printout.  This printout contains a lot of information (such as the estimate fit coefficients, the standard errors, the t-values and the significances) that a statistician would want to see and want to use in further calculations.  But it is unclear how to get at these numbers.  For example: how do you get the &#8220;standard errors&#8221; (the numbers in the &#8220;Std. Error&#8221; column) from the returned model?  Are we forced to cut and paste them from the printed report?   What can you do?</p>
<p>The documentation nearly tells us what we need to know.  <tt>help(lm)</tt> yields:</p>
<blockquote><p><tt><br />
The functions summary and anova are used to obtain and print a summary and analysis of variance table of the results. The generic accessor functions coefficients, effects, fitted.values and residuals extract various useful features of the value returned by lm.<br />
</tt></p></blockquote>
<p>To a newer R user this may not be clear (as there are technical issues from both R and statistics quickly being run through).   However, the experienced R user would immediately recognize from this help that what is returned form <tt>summary(model)</tt> is an object (not just a blob of text) and that looking at the class of the returned object (which turns out to be summary.lm) might tell them what they would need to know.</p>
<p>Typing:</p>
<pre>
&gt;class(summary(model))
[1] "summary.lm"
&gt; help(summary.lm)
</pre>
<p>Yields:</p>
<blockquote><p><tt><br />
coefficients: a p x 4 matrix with columns for the estimated coefficient, its standard error, t-statistic and corresponding (two-sided) p-value. Aliased coefficients are omitted.<br />
</tt></p></blockquote>
<p>But if you are not very familiar with R you might miss that the summary function returns a useful object (instead of blob of text).  Also you might only know to look at <tt>help(summary)</tt> which  does not describe the location of the desired standard errors (but does have a reference to summary.lm, so if you are patient you might find it).  We describe how to find the information you need by using R&#8217;s object inspection facilities.  This is a &#8220;doing it the hard way&#8221; technique for when you do not understand the help system or you are using a package with less complete help documentation.</p>
<p>First  (using the techniques described in the slides:  <a href="http://www.win-vector.com/blog/2009/09/survive-r/">Survive R</a>) examine the model to see if the standard errors are there:</p>
<pre>
&gt; names(model)
 [1] "coefficients"  "residuals"     "effects"       "rank"          "fitted.values" "assign"
 [7] "qr"            "df.residual"   "xlevels"       "call"          "terms"         "model"    

&gt; model$coefficients
(Intercept)           x
-0.02609243  2.99150259
</pre>
<p>We found the coefficients, but did not find the standard errors.  Now we know the standard errors are reported by <tt>summary(model)</tt>, so they must be somewhere.  Instead of performing a wild goose chase to find the standard errors let&#8217;s instead trace how the summary method works to find where it gets them.  If we type print(summary) we don&#8217;t get any really useful information.  This is because summary is a generic method and we need to know what type-qualified name the summary of a linear model is called.</p>
<pre>
&gt; class(model)
[1] "lm"
</pre>
<p>So we see our model is of type lm so the <tt>summary(model)</tt> call would use a summary method called summary.lm (which as we saw is also the returned class of the <tt>summary(model)</tt> object).  As we mentioned the solution is in <tt>help(summary.lm)</tt>, but if the solution had not been there we could still make progress:  we could dump the source of the summary.lm method:</p>
<pre>
&gt; print(summary.lm)
function (object, correlation = FALSE, symbolic.cor = FALSE,
    ...)
{
    ....
    class(ans) &lt;- "summary.lm"
    ans
}
</pre>
<p>We actually deleted the bulk of the print(summary.lm) result because the important thing to notice is that the method is huge and that it returns an object instead of a blob of text.  The fact that the method summary.lm was huge means that it is likely calculating the things it reports (confirming that the standard errors are not part of the model object).  The fact that an object is returned means that what we are looking for may sitting somewhere in the summary waiting for us.  To find what we are looking for we convert the summary into a list (using the unclass() method) and look for something with the name or value we are looking for:</p>
<pre>
&gt; unclass(summary(model))
$call
lm(formula = y ~ x, data = d)
...
$coefficients
               Estimate Std. Error    t value      Pr(&gt;|t|)
(Intercept) -0.02609243 0.02102062  -1.241278  2.174662e-01
x            2.99150259 0.02201930 135.858209 2.095643e-113
...
</pre>
<p>And we have found it.  The named slot <tt>summary(model)</tt>$coefficients is in fact a table that has what we are looking for in the second column.  We can create a new list that will let us look up the standard errors by name (for the variable x and for the intercept):</p>
<pre>
&gt; stdErrors &lt;- as.list(summary(model)$coefficients[,2])
</pre>
<p>Now that we have the stdErrors in list form we can look up the numbers we wanted by name.</p>
<pre>
&gt; stdErrors['x']
$x
[1] 0.0220193

&gt; stdErrors['(Intercept)']
$`(Intercept)`
[1] 0.02102062
</pre>
<p>And we finally have the standard errors.  But why did we want the standard errors?  In this case I wanted the standard errors so I could plot the fit model and show the uncertainty of the model.  As, is often the case, R already has a function that does all of this.  Also (as is often the case) the R function that does this asks the right statistical question (instead of the obvious question) and can draw error bars that display the uncertainty of future predictions.  The uncertainty in future prediction is in fact different than the uncertainty of the estimate (what was most obvious to calculate from the standard errors) and (after some reflection) is what I really wanted.  Having these sort of distinctions already thought out is why we are using a statistics package like R instead of just coding everything up.  These calculations are all trivial to implement- but remembering to perform the calculations that answer the right statistical questions can be difficult.  The built in R solution of plotting the the fit model (black line) and the region of expected prediction uncertainty (blue lines) is as follows:</p>
<pre>
&gt; pred &lt;- predict.lm(model,interval='prediction')
&gt; dfit &lt;- data.frame(x,y,fit=pred[,1],lwr=pred[,2],upr=pred[,3])
&gt; ggplot(data=dfit) + geom_point(aes(x=x,y=y)) +
    geom_line(aes(x,fit)) +
    geom_line(aes(x=x,y=lwr),color='blue') + geom_line(aes(x=x,y=upr),color='blue')
</pre>
<p><center><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/11/fit1.png" alt="fit1.png" border="0" width="500" height="500" /><br />
</center></p>
<p>And we are done.</p>


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/09/survive-r/' rel='bookmark' title='Permanent Link: Survive R'>Survive R</a></li>
<li><a href='http://www.win-vector.com/blog/2009/12/cru-graph-yet-again-with-r/' rel='bookmark' title='Permanent Link: CRU graph yet again (with R)'>CRU graph yet again (with R)</a></li>
<li><a href='http://www.win-vector.com/blog/2010/03/r-annoyances/' rel='bookmark' title='Permanent Link: R annoyances'>R annoyances</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2009/11/r-examine-objects-tutorial/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
