<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Win-Vector Blog &#187; ANOVA</title>
	<atom:link href="http://www.win-vector.com/blog/tag/anova/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.win-vector.com/blog</link>
	<description>The Applied Theorist&#039;s Point of View</description>
	<lastBuildDate>Thu, 29 Jul 2010 17:09:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Statistics to English Translation, Part 2b: Calculating Significance</title>
		<link>http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2b-calculating-significance/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=statistics-to-english-translation-part-2b-calculating-significance</link>
		<comments>http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2b-calculating-significance/#comments</comments>
		<pubDate>Mon, 14 Dec 2009 07:02:40 +0000</pubDate>
		<dc:creator>Nina Zumel</dc:creator>
				<category><![CDATA[Applications]]></category>
		<category><![CDATA[Expository Writing]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Statistics To English Translation]]></category>
		<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[F-test]]></category>
		<category><![CDATA[significance]]></category>
		<category><![CDATA[t-test]]></category>

		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=1281</guid>
		<description><![CDATA[In the previous installment of the Statistics to English Translation, we discussed the technical meaning of the term &#8221;significant&#8221;. In this installment, we look at how significance is calculated. This article will be a little more technically detailed than the last one, but our primary goal is still to help you decipher statements about significance [...]


Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2a-%e2%80%99significant%e2%80%99-doesn%e2%80%99t-always-mean-%e2%80%99important%e2%80%99/' rel='bookmark' title='Permanent Link: Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’'>Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’</a></li>
<li><a href='http://www.win-vector.com/blog/2009/11/i-dont-think-that-means-what-you-think-it-means-statistics-to-english-translation-part-1-accuracy-measures/' rel='bookmark' title='Permanent Link: &#8220;I don&#8217;t think that means what you think it means;&#8221; Statistics to English Translation, Part 1: Accuracy Measures'>&#8220;I don&#8217;t think that means what you think it means;&#8221; Statistics to English Translation, Part 1: Accuracy Measures</a></li>
<li><a href='http://www.win-vector.com/blog/2010/02/living-in-a-lognormal-world/' rel='bookmark' title='Permanent Link: Living in A Lognormal World'>Living in A Lognormal World</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2a-’significant’-doesn’t-always-mean-’important’/">previous installment</a> of the <a href="http://www.win-vector.com/blog/category/statistics-to-english-translation/">Statistics to English Translation</a>, we discussed the technical meaning of the term &#8221;significant&#8221;. In this installment, we look at how significance is calculated. This article will be a little more technically detailed than the last one, but our primary goal is still to help you decipher statements about significance in research papers: statements like &#8220;<!-- MATH  $(F(2, 864) = 6.6, p = 0.0014)$  --><br />
<img src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg4.png" border="0" alt="$ (F(2, 864) = 6.6, p = 0.0014)$" width="238" height="37" align="middle" /> &#8221;.</p>
<p>As in the <a href="http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2a-’significant’-doesn’t-always-mean-’important’/">last article</a>, we will concentrate on situations where we want to test the difference of means. You should read that previous article first, so you are familiar with the terminology that we use in this one.</p>
<p>A pdf version of this current article can be found <a href="http://win-vector.com/dfiles/ste2b_calculatesig.pdf">here</a>.<br />
<span id="more-1281"></span></p>
<h1><a name="SECTION00010000000000000000" id="SECTION00010000000000000000">How is Significance Determined?</a></h1>
<p>Generally speaking, we calculate significance by computing a <em>test statistic</em> from the data. If we assume a specific null hypothesis, then we know that this test statistic will be distributed in a certain way. We can then compute how likely it is to observe our value of the test statistic, if we assume that the null hypothesis is true.</p>
<p>We&#8217;ll explain the use of a test statistic with our Sneetch example from the last installment.</p>
<h1><a name="SECTION00020000000000000000" id="SECTION00020000000000000000">The t-test for Difference of Means</a></h1>
<p>Suppose that the test scores for both Star-Bellies and Plain-Bellies are normally distributed, with the means and standard deviations as given in the table below.</p>
<div align="center">
<table cellpadding="3" border="1">
<tr>
<td align="center">&nbsp;</td>
<td align="center"><img width="16" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg5.png" alt="$ n$"> (number of subjects)</td>
<td align="center"><img width="21" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg6.png" alt="$ m$"> (mean score)</td>
<td align="center"><img width="14" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg7.png" alt="$ s$"> (standard error)</td>
</tr>
<tr>
<td align="center">Star-Bellies</td>
<td align="center">50</td>
<td align="center">78</td>
<td align="center">7</td>
</tr>
<tr>
<td align="center">Plain-Bellies</td>
<td align="center">40</td>
<td align="center">74</td>
<td align="center">8</td>
</tr>
</table>
</div>
<p>Remember from the previous installment that we can estimate the true population means <img width="24" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg8.png" alt="$ \mu_1$"> and <img width="24" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg9.png" alt="$ \mu_2$"> as normally distributed around the empirical population means <img width="29" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg10.png" alt="$ m_1$"> and <img width="29" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg11.png" alt="$ m_2$"> respectively, with variances<br />
<img width="52" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg12.png" alt="$ \sigma^2/{n_1}$"> and<br />
<img width="52" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg13.png" alt="$ \sigma^2/{n_2}$"> . This is shown in Figure <a href="#fig:twomeans">1</a>. Informally speaking, there is no significant difference in the two populations if the shaded overlap area in Figure <a href="#fig:twomeans">1</a> is large.</p>
<div align="center"><a name="fig:twomeans" id="fig:twomeans"></a><a name="36"></a></p>
<table>
<caption align="bottom"><strong>Figure 1:</strong> The estimates of the means for two populations</caption>
<tr>
<td>
<div align="center"><img width="282" height="204" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/./overlap.png" alt="Image overlap"></div>
</td>
</tr>
</table>
</div>
<p>Calculating this area is somewhat involved. Instead, we calculate the <em>t-statistic</em>:</p>
<div align="center">
<table cellpadding="0" width="100%" align="center">
<tr valign="middle">
<td nowrap align="center"><img width="126" height="62" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg14.png" alt="$\displaystyle t = \frac{(m_2 - m_1)}{s_D}$"></td>
<td nowrap width="10" align="right">(1)</td>
</tr>
</table>
</div>
<p><br clear="all"><br />
where <img width="26" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg15.png" alt="$ s_D$"> is called the <em>pooled variance</em> of the two populations.</p>
<div align="center">
<table cellpadding="0" width="100%" align="center">
<tr valign="middle">
<td nowrap align="center"><img width="325" height="64" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg16.png" alt="$\displaystyle {s_D}^2 = \frac{n_1\cdot {s_1}^2 + n_2\cdot {s_2}^2}{n_1 + n_2 - 2} \cdot (1/n_1 + 1/n_2)$"></td>
<td nowrap width="10" align="right">(2)</td>
</tr>
</table>
</div>
<p><br clear="all"></p>
<p>For our Sneetch example, <img width="75" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg17.png" alt="$ s_D = 1.6$"> , and <img width="79" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg18.png" alt="$ t=2.499$"> , or the negative of that, depending on which group is Group 1. There are<br />
<img width="142" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg19.png" alt="$ 50 + 40 - 2 = 88$"> degrees of freedom.</p>
<p>If the null hypothesis is true, and the two populations are identical, then <img width="12" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg1.png" alt="$ t$"> is distributed according to <em>Student&#8217;s distribution with<br />
<img width="105" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg20.png" alt="$ N_1 + N_2 - 2$"> degrees of freedom</em>. Student&#8217;s distribution is sort of a &#8220;stretched out&#8221; bell curve; as the degrees of freedom increase (<br />
<img width="122" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg21.png" alt="$ N_1 + N_2 \rightarrow \infty$"> ), Student&#8217;s distribution approaches the standard normal distribution, <img width="63" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg22.png" alt="$ N(0, 1)$"> <a name="tex2html2" href="#foot209" id="tex2html2"><sup>1</sup></a>.</p>
<p>In other words, if the null hypothesis is true, <img width="12" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg1.png" alt="$ t$"> should be near zero. The probability of seeing a <img width="12" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg1.png" alt="$ t$"> of a certain magnitude or greater under the null hypothesis is given by the area under the tails of Student&#8217;s distribution:</p>
<div align="center"><a name="57"></a></p>
<table>
<caption align="bottom"><strong>Figure 2:</strong> The area under the tails for a given <img width="12" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg1.png" alt="$ t$"></caption>
<tr>
<td>
<div align="center"><img width="514" height="365" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/./twotailedtest.jpg" alt="Image twotailedtest"></div>
</td>
</tr>
</table>
</div>
<p>This area is <img width="14" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg27.png" alt="$ p$"> . For the Sneetch example, <img width="82" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg28.png" alt="$ p = 0.014$"> .</p>
<p>The further out on the tails <img width="12" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg1.png" alt="$ t$"> is, the stronger the evidence that you should reject the null hypothesis. If you know for some reason that the mean of one population will be greater than or equal to the other, than you can use the <em>one-tailed test</em>:</p>
<div align="center"><a name="64"></a></p>
<table>
<caption align="bottom"><strong>Figure 3:</strong> The one-tailed test for a given <img width="12" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg1.png" alt="$ t$"></caption>
<tr>
<td>
<div align="center"><img width="514" height="365" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/./onetailedtest.jpg" alt="Image onetailedtest"></div>
</td>
</tr>
</table>
</div>
<p>This test halves the p-value as compared to the two-tailed test, making a given <img width="12" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg1.png" alt="$ t$"> value twice as significant. When in doubt about which to use, the two-tailed test is more conservative against false positives<a name="tex2html5" href="#foot210" id="tex2html5"><sup>2</sup></a>.</p>
<p>In discussions of t-tests, you will often see statements of the form:</p>
<blockquote><p>The t-test meets the hypothesis that two means are equal if</p></blockquote>
<div align="center">
<table cellpadding="0" width="100%" align="center">
<tr valign="middle">
<td nowrap align="center"><img width="88" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg31.png" alt="$\displaystyle \vert t\vert &gt; t_{\alpha/2, \nu}$"></td>
<td nowrap width="10" align="right">&nbsp;&nbsp;&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"></p>
<blockquote><p>for a two-tailed test, or</p></blockquote>
<div align="center">
<table cellpadding="0" width="100%" align="center">
<tr valign="middle">
<td nowrap align="center"><img width="64" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg32.png" alt="$\displaystyle t &gt; t_{\alpha, \nu}$"></td>
<td nowrap width="10" align="right">&nbsp;&nbsp;&nbsp;</td>
</tr>
</table>
</div>
<p><br clear="all"></p>
<blockquote><p>for a (right-sided) one-tailed test.</p></blockquote>
<p>The quantities on the right hand side of the two equations above are called the <em>critical values</em> for a given significance level <img width="17" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg33.png" alt="$ \alpha$"> (usually,<br />
<img width="75" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg34.png" alt="$ \alpha = 0.05$"> ) and <img width="15" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg35.png" alt="$ \nu$"> degrees of freedom. The critical values are the values for which the area of the right hand tail is equal to <img width="17" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg33.png" alt="$ \alpha$"> .</p>
<div align="center"><a name="211"></a></p>
<table>
<caption align="bottom"><strong>Figure 4:</strong> Critical value for a one-tailed test. Reject the null hypothesis if<br />
<img width="66" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg2.png" alt="$ t &gt; t_{crit}$"></caption>
<tr>
<td>
<div align="center"><img width="385" height="252" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/./onetailedcritval.png" alt="Image onetailedcritval"></div>
</td>
</tr>
</table>
</div>
<p>For a two-tailed test, you must halve the area under a single tail.</p>
<div align="center"><a name="212"></a></p>
<table>
<caption align="bottom"><strong>Figure 5:</strong> Critical value for a two-tailed test. Reject the null hypothesis if<br />
<img width="77" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg3.png" alt="$ \vert t\vert &gt; t_{crit}$"></caption>
<tr>
<td>
<div align="center"><img width="384" height="248" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/./twotailedcritval.png" alt="Image twotailedcritval"></div>
</td>
</tr>
</table>
</div>
<p>This convention dates back to the time when computational resources were scarce, and researchers had to use pre-computed tables of critical values, rather than calculating <img width="14" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg27.png" alt="$ p$"> directly. Today, general statistical packages such as R or Matlab can compute the CDFs of any number of standard distributions; once you can compute the CDF, directly computing <img width="14" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg27.png" alt="$ p$"> (the area under the tails) is straightforward. Despite this, many tutorials of the t-test (and of the F-test, and other significance tests) still adhere to the convention of comparing test statistics to critical values. This tends to needlessly ritualize the whole process, and make it seem more complicated and mysterious than it actually is, at least in my opinion.</p>
<p>David Freedman was very much against the continued practice of using critical values, rather than reporting the actual p-value. The last chapter of Freedman, Pisani and Purves [<a href="#Freedman07">FPP07</a>] is worth reading for its discussion of this, and other potential pitfalls of significance tests.</p>
<p>Some standard packages for evaluating t-tests, F-tests, or the ANOVA also present analysis results in terms of critical values. Most of them do usually print the actual p value as well, along with the value of the test statistic and the degrees of freedom. Most researchers rightfully report the test statistics along with the actual significance levels: &#8220;we conclude that there is a significant difference in mathematical performance (t(88) = 2.499, p = 0.014)&#8230; .&#8221; Here, 88 gives the degrees of freedom, <img width="45" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg36.png" alt="$ t(88)$"> is the value of the t-statistic, and <img width="14" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg27.png" alt="$ p$"> is of course the p-value.</p>
<p>Similar comments apply to the F-test, discussed in more detail below.</p>
<h2><a name="SECTION00021000000000000000" id="SECTION00021000000000000000">Assumptions</a></h2>
<p>Strictly speaking, the t-test is only valid for normally distributed data where both populations have equal variance. However, the test is fairly robust to non-normal data [<a href="#Box53">Box53</a>]. You can verify that the sample variances are &#8220;equal enough&#8221; &#8211; that is, they could plausibly both be sampled observations from populations with the same variance, by using the <em>F-test</em>. The F-statistic</p>
<div align="center"><img width="102" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg37.png" alt="$\displaystyle F = {s_1}^2/{s_2}^2 $"></div>
<p>is distributed according to the <em>F distribution with<br />
<img width="131" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg38.png" alt="$ (n_1 - 1,n_2 - 1)$"> degrees of freedom</em></p>
<div align="center"><a name="104"></a></p>
<table>
<caption align="bottom"><strong>Figure 6:</strong> The F distribution</caption>
<tr>
<td>
<div align="center"><img width="514" height="365" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/./Ftest.jpg" alt="Image Ftest"></div>
</td>
</tr>
</table>
</div>
<p>In practice, the larger variance is usually put in the numerator, so <img width="54" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg39.png" alt="$ F &gt; 1$"> . The test should still be two-tailed, so you should double the area under the right-hand tail<a name="tex2html9" href="#foot107" id="tex2html9"><sup>3</sup></a>. In this situation, you want to check if you ƒshould accept the null hypothesis (that<br />
<img width="54" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg44.png" alt="$ F \approx 1$"> ) at a given significance level. If so, then you can go ahead and apply the t-test.</p>
<p>There is a variation of the t-tests for distributions of unequal variance, called Welch&#8217;s t-test [<a href="#WikiWelch">Wikc</a>]. In this case, you are only checking if the means are equal, not that the distributions are the same.</p>
<h1><a name="SECTION00030000000000000000" id="SECTION00030000000000000000">The F-test for Analysis of Variance (ANOVA)</a></h1>
<p>ANOVA is an extension of the difference of means test above to the casae of more than two populations. The null hypothesis in this case is that all the sample means are equal &#8211; or more strictly, that all the treatment groups are drawn from the same population.</p>
<p>The simplest version of the ANOVA is the <em>one-way ANOVA</em>, where there are <img width="15" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg45.png" alt="$ k$"> <em>treatment groups</em> (populations) with <img width="21" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg46.png" alt="$ n_i$"> subjects (or repetitions, or replications) each, for a total of <img width="22" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg47.png" alt="$ N$"> subjects. Each population corresponds to a different single factor (a treatment or a condition: for example, a type of medicine, or a Star-Bellied Sneetch vs. a Plain-Bellied Sneetch vs. a Grinch). Two- or three- way ANOVAs correspond to varying two or three different factors combinatorially. For example, we could do a two-way ANOVA of Sneetch math performance by considering both the belly type and the gender of the Sneetchs.</p>
<div align="center"><a name="115"></a></p>
<table>
<caption align="bottom"><strong>Figure 7:</strong> Table for a Two-way ANOVA of Sneetch math performance</caption>
<tr>
<td>
<div align="center"><img width="203" height="243" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/./twowayANOVA.png" alt="Image twowayANOVA"></div>
</td>
</tr>
</table>
</div>
<p>We will only discuss one-way ANOVA in this article, since that covers all the relevant ideas about calculating significance.</p>
<p>For a one-way ANOVA, we have the population means <img width="27" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg48.png" alt="$ m_i$"> and variances <img width="27" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg49.png" alt="$ {s_i}^2$"> . We can also calculate the overall mean <img width="29" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg50.png" alt="$ m_0$"> , over the entire aggregate population.</p>
<p>The <em>between-groups mean sum of squares</em>, which is an estimate of the <em>between-groups variance</em>, is given by</p>
<div align="center">
<table cellpadding="0" width="100%" align="center">
<tr valign="middle">
<td nowrap align="center"><img width="260" height="58" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg51.png" alt="$\displaystyle {s_B}^2 = \frac{1}{k-1} \sum_i {n_i \cdot (m_i - m_0)^2}$"></td>
<td nowrap width="10" align="right">(3)</td>
</tr>
</table>
</div>
<p><br clear="all"></p>
<p><img width="33" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg52.png" alt="$ {s_B}^2$"> is sometimes designated <img width="48" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg53.png" alt="$ MS_B$"> It is a measure of how the population means vary with respect to the grand mean.</p>
<p>The <em>within-group mean sum of squares</em> is an estimate of the <em>within-group variance</em>:</p>
<div align="center"><a name="eqn:varw" id="eqn:varw"></a></p>
<table cellpadding="0" width="100%" align="center">
<tr valign="middle">
<td nowrap align="center"><img width="256" height="77" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg54.png" alt="$\displaystyle {s_W}^2 = \frac{1}{N-k} \sum_i^k \sum_j^{n_i} {x_{ij} - m_i}^2$"></td>
<td nowrap width="10" align="right">(4)</td>
</tr>
</table>
</div>
<p><br clear="all"></p>
<p><img width="37" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg55.png" alt="$ {s_W}^2$"> is sometimes designated <img width="52" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg56.png" alt="$ MS_W$"> . It is a measure of the &#8220;average population variance&#8221;.</p>
<div align="center"><a name="142"></a></p>
<table>
<caption align="bottom"><strong>Figure 8:</strong> Within-group and between-group variance</caption>
<tr>
<td>
<div align="center"><img width="322" height="214" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/./sigmas.png" alt="Image sigmas"></div>
</td>
</tr>
</table>
</div>
<p>If the null hypothesis is true, then</p>
</p>
<div align="center"><img width="114" height="40" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg57.png" alt="$\displaystyle F = {s_B}^2/{s_W}^2 $"></div>
<p>is distributed according to the F distribution wiht<br />
<img width="116" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg58.png" alt="$ (k-1, n-k)$"> degrees of freedom.</p>
<div align="center"><a name="150"></a></p>
<table>
<caption align="bottom"><strong>Figure 9:</strong> p-value for the one-tailed F-test</caption>
<tr>
<td>
<div align="center"><img width="514" height="365" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/./Ftest.jpg" alt="Image Ftest"></div>
</td>
</tr>
</table>
</div>
<p>That is, under the null hypothesis, the within-group and between-group variances should be about equal:<br />
<img width="54" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg44.png" alt="$ F \approx 1$"> . If <img width="54" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg59.png" alt="$ F &lt; 1$"> , then some of the treatment groups overlap other groups substantially, so practically speaking, one might as well accept the null hypothesis. Hence, a one-sided F test is good enough. As with the t-test, research papers usually give the value of the F statistic, the degrees of freedom, and the p-value: &#8220;<br />
<img width="238" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg4.png" alt="$ (F(2, 864) = 6.6, p = 0.0014)$"> &#8221;. In this example, the test statistic value is 6.6, and it was evaluated against the F distribution with (2, 864) degrees of freedom, which means that<br />
<img width="122" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg60.png" alt="$ k = 3, n = 866$"> . The p-value is 0.0014.</p>
<h2><a name="SECTION00031000000000000000" id="SECTION00031000000000000000">Assumptions</a></h2>
<p>Like the t-test, ANOVA assumes that the data is normally distributed with equal variances. According to Box [<a href="#Box53">Box53</a>], ANOVA is fairly robust to unequal variances when the population sizes are about the same, but you might want to check anyway. If all the populations are the same size (all the <img width="21" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg46.png" alt="$ n_i$"> are the same), the easiest way to check for equality of variances is an F-test of the statistic<br />
<img width="140" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg61.png" alt="$ F = {s_{max}}^2/{s_{min}}^2$"> with <img width="49" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg62.png" alt="$ n-1$"> degrees of freedom[<a href="#Sachs84">Sac84</a>]. In other cases, you can use Bartlett&#8217;s Test [<a href="#WikiBartlett">Wika</a>] or Levene&#8217;s Test [<a href="#WikiLevene">Wikb</a>]. Bartlett&#8217;s test uses a test statistic that is distributed as the <img width="24" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg63.png" alt="$ \chi^2$"> distribution, and Levene&#8217;s test uses one that is distributed as the F distribution. Levene&#8217;s test does not assume normally distributed data.</p>
<p>If the data are not normally distributed, or have unequal variance, often they can be transformed to a form that is closer to obeying the assumptions of ANOVA. The following table of transformations is based on [<a href="#Sachs84">Sac84</a>, p. 517], and other sources [<a href="#ndsu">Hor</a>].</p>
<div align="center"><a name="177"></a></p>
<table>
<caption align="bottom"><strong>Figure 10:</strong> Table of Transformations</caption>
<tr>
<td><img width="500" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg64.png" alt="\begin{figure}\begin{center} \begin{tabular}{\vert p{2.5in}\vert p{3.5in}\vert} ... ...} \ $\sigma \approx k\mu$\ &amp; \ \hline \end{tabular} \end{center}\end{figure}"></td>
</tr>
</table>
</div>
<p>Jim Deacon from the University of Edinburgh lists some suggestions as well [<a href="#deacon07">Dea</a>]. He also reminds us that running ANOVA on the transformed data will identify significant differences in the <em>transformed</em> data. This is <em>not</em> the same as saying there are significant differences in the original data!</p>
<h1><a name="SECTION00040000000000000000" id="SECTION00040000000000000000">Once the Null Hypothesis is Rejected</a></h1>
<p>If you are able to reject the ANOVA null hypothesis, you will usually want to know which population means are significantly different from the rest. Often, in fact, you are primarily interested in which population had the highest mean. For example, if you are comparing the efficacy of a new medicine A against existing medicines B and C, you are probably not too concerned about whether B and C perform significantly differently from each other, only about whether A is significantly better than both.</p>
<p>If all you care about is whether the highest mean is significantly higher than the others, you can simply test where the statistic</p>
</p>
<div align="center"><img width="211" height="56" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg65.png" alt="$\displaystyle (m_1 - m_2)/({s_W}^2 \frac{n_1 + n_2}{n_1\cdot n_2}) $"></div>
<p>falls on the Student-t distribution with <img width="50" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg66.png" alt="$ n-k$"> degrees of freedom. Here, <img width="37" height="38" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg55.png" alt="$ {s_W}^2$"> is the within-group variance, as calculated in Equation <a href="#eqn:varw">4</a>, <img width="29" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg10.png" alt="$ m_1$"> and <img width="29" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg11.png" alt="$ m_2$"> are the highest and second highest population means, <img width="16" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg5.png" alt="$ n$"> is the total number of samples (<br />
<img width="81" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg67.png" alt="$ n = \sum{n_i}$"> ), and <img width="15" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg45.png" alt="$ k$"> is the number of treatment groups.</p>
<p>This test is usually written</p>
</p>
<div align="center"><img width="409" height="67" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg68.png" alt="$\displaystyle m_1 - m_2 &gt; t_{(n-k, \alpha/2)} \cdot \sqrt{{s_W}^2 \cdot \frac{n_1 + n_2}{n_1\cdot n_2}} = LSD_{(1,2)} $"></div>
<p>where<br />
<img width="75" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg69.png" alt="$ t_{(n-k, \alpha/2)}$"> is the (two-sided) critical value for significance level <img width="17" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg33.png" alt="$ \alpha$"> and <img width="50" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg66.png" alt="$ n-k$"> is the number of degrees of freedom to use. This quantity is called the <em>least significant difference (LSD)</em> between the highest and second highest means, and the test is usually called the <em>LSD test</em>.</p>
<p>If you want to test all the population differences <img width="73" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg70.png" alt="$ m_i - m_j$"> for significance, (or test the highest value against all of the others explicitly) then you need to take some care with the LSD test. Remember that a significance level of <img width="17" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg33.png" alt="$ \alpha$"> means that with probability <img width="17" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg33.png" alt="$ \alpha$"> you will make a false positive error. To test all possible population differences is <img width="22" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg71.png" alt="$ K$"> = (<img width="15" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg45.png" alt="$ k$"> choose <img width="14" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg72.png" alt="$ 2$"> ) comparisons, or <img width="90" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg73.png" alt="$ K = k-1$"> comparisons, if you sort all the means in descending order and compare adjacent ones. Testing the highest mean against all the lower values is also <img width="90" height="34" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg73.png" alt="$ K = k-1$"> comparisons. This means you have a<br />
<img width="48" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg74.png" alt="$ K \cdot \alpha$"> probability of making a false positive error. So if you want the overall significance level to be <img width="17" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg33.png" alt="$ \alpha$"> , each individual comparison should use a stricter significance threshold<br />
<img width="78" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg75.png" alt="$ p \leq \alpha/K$"> .</p>
<p>A preferred way to compare multiple means for significance (once the ANOVA null hypothesis has been rejected) is to use a <em>multiple range test</em> [<a href="#deacon07">Dea</a>] or <em>Tukey&#8217;s method</em> [<a href="#nistTukey">oST06</a>], rather than the LSD test. Tukey&#8217;s method tests all pairwise comparison simultaneously, and the multiple range test starts with the broadest range (the highest and the lowest means), and works its way in until significance is lost.</p>
<h1><a name="SECTION00050000000000000000" id="SECTION00050000000000000000">Conclusion</a></h1>
<p>We&#8217;ve skimmed over many complications in this discussion. Hopefully, though, what we have gone over is enough to demystify much of the statistical discussion in research papers. Perhaps, it will demystify the output of standard ANOVA and t-test packages for you, as well.</p>
<p>Chong-ho Yu&#8217;s site [<a href="#yu09">hY</a>] gives a brief discussion of some of the issues that I&#8217;ve skimmed over. It also lists a few common non-parametric tests. These are tests that do not make assumptions about how the data is distributed, and so they may be more appropriate for data that is very non-normal, or for discrete data. They tend to have less power than parametric tests (that is, they have a lower true positive rate); so if the data is at all normal-like, parametric tests are preferred.</p>
<p>Significance tests are used in other applications beyond testing the difference in means or variances. They are used for testing whether events follow an expected distribution, for testing if there is a correlation between two variables, and for evaluating the coefficients of a regression analysis. We hope to cover some of these applications in future installments of this series.</p>
<h2><a name="SECTION00060000000000000000" id="SECTION00060000000000000000">Bibliography</a></h2>
<dl compact>
<dt><a name="Box53" id="Box53">Box53</a></dt>
<dd>G.E.P. Box, <i>Non-normality and tests on variances</i>, Biometrika <b>40</b> (1953), no.&nbsp;3/4, 318-335.</dd>
<dt><a name="deacon07" id="deacon07">Dea</a></dt>
<dd>Jim Deacon, <i>A multiple range test for comparing means in an analysis of variance</i>, <a href="http://www.biology.ed.ac.uk/research/groups/jdeacon/statistics/tress7.html">http://www.biology.ed.ac.uk/research/groups/jdeacon/statistics/tress7.html</a>.</dd>
<dt><a name="Freedman07" id="Freedman07">FPP07</a></dt>
<dd>David Freedman, Robert Pisani, and Roger Purves, <i>Statistics</i>, 4th ed., W. W. Norton &amp; Company, New York, 2007.</dd>
<dt><a name="ndsu" id="ndsu">Hor</a></dt>
<dd>Rich Horsley, <i>Transformations</i>, <tt><a name="tex2html14" href="http://www.ndsu.nodak.edu/ndsu/horsley/Transfrm.pdf" id="tex2html14">http://www.ndsu.nodak.edu/ndsu/horsley/Transfrm.pdf</a></tt>, Class notes, Plant Sciences 724, North Dakota State University.</dd>
<dt><a name="yu09" id="yu09">hY</a></dt>
<dd>Chong ho&nbsp;Yu, <i>Parametric tests</i>, <a href="http://www.creative-wisdom.com/teaching/WBI/parametric_test.shtml">http://www.creative-wisdom.com/teaching/WBI/parametric_test.shtml</a>.</dd>
<dt><a name="nistTukey" id="nistTukey">oST06</a></dt>
<dd>National&nbsp;Institute of&nbsp;Standards and Technology, <i>Tukey&#8217;s method</i>, NIST/SEMATECH e-Handbook of Statistical Methods, 2006, <a href="http://itl.nist.gov/div898/handbook/prc/section4/prc471.htm">http://itl.nist.gov/div898/handbook/prc/section4/prc471.htm.</dd>
<dt><a name="Sachs84" id="Sachs84">Sac84</a></dt>
<dd>Lothar Sachs, <i>Applied statistics: A handbook of techniques</i>, 2nd ed., Springer-Verlag, New York, 1984.</dd>
<dt><a name="WikiBartlett" id="WikiBartlett">Wika</a></dt>
<dd>Wikipedia, <i>Bartlett&#8217;s test</i>, <tt><a name="tex2html15" href="http://en.wikipedia.org/wiki/Bartlett's_test" id="tex2html15">http://en.wikipedia.org/wiki/Bartlett's_test</a></tt>.</dd>
<dt><a name="WikiLevene" id="WikiLevene">Wikb</a></dt>
<dd>&#8212;&#8211;, <i>Levene&#8217;s test</i>, <tt><a name="tex2html16" href="http://en.wikipedia.org/wiki/Levene's_test" id="tex2html16">http://en.wikipedia.org/wiki/Levene's_test</a></tt>.</dd>
<dt><a name="WikiWelch" id="WikiWelch">Wikc</a></dt>
<dd>&#8212;&#8211;, <i>Welch&#8217;s t test</i>, <tt><a name="tex2html17" href="http://en.wikipedia.org/wiki/Welch's_t_test" id="tex2html17">http://en.wikipedia.org/wiki/Welch's_t_test</a></tt>.</dd>
</dl>
<p></p>
<hr />
<h4>Footnotes</h4>
<dl>
<dt><a name="foot209" id="foot209">&#8230;</a><a href="#tex2html2"><sup>1</sup></a></dt>
<dd>Remember from the last installment that when you are estimating the mean of a distribution with unknown mean <img width="16" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg23.png" alt="$ \mu$"> and unknown variance <img width="24" height="19" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg24.png" alt="$ \sigma^2$"> , the 95% confidence interval around your estimate is<br />
<img width="115" height="39" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg25.png" alt="$ m \pm 2\cdot \sigma/\sqrt{n}$"> . Intuitively speaking, Student&#8217;s distribution is what you get if you calculate confidence intervals using the estimated variance <img width="14" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg7.png" alt="$ s$"> instead of the true but unknown variance <img width="16" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg26.png" alt="$ \sigma$"> . The distribution is stretched out compared to the normal distribution to reflect this increased uncertainty.</dd>
<dt><a name="foot210" id="foot210">&#8230; positives</a><a href="#tex2html5"><sup>2</sup></a></dt>
<dd>In his textbook <em>Statistics</em>, Freedman tells an anecdote about a study that was published in the <em>Journal of the AMA</em>, claiming to demonstrate that cholesterol causes heart attacks. The treatment group that took a cholesterol reducing drug had &#8220;significantly fewer&#8221; heart attacks than the control group (<br />
<img width="82" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg29.png" alt="$ p \approx 0.035$"> ). A closer reading revealed that the researchers used a one-tailed test, which is equivalent to <em>assuming</em> that the treatment group was going to have fewer heart attacks. What if the drug had <em>increased</em> the risk of heart attack? The proper two-tailed significance of their results would have been<br />
<img width="73" height="33" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg30.png" alt="$ p \approx 0.07$"> , which is higher than <em>JAMA</em>&#8216;s strict significance threshold of 0.05. [<a href="#Freedman07">FPP07</a>, p. 550]</dd>
<dt><a name="foot107" id="foot107">&#8230; tail</a><a href="#tex2html9"><sup>3</sup></a></dt>
<dd>The area to the right of <img width="19" height="17" align="bottom" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg40.png" alt="$ F$"> with <img width="45" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg41.png" alt="$ (a,b)$"> degrees of freedom is equal to the area to the left of <img width="38" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg42.png" alt="$ 1/F$"> , with <img width="45" height="37" align="middle" border="0" src="http://www.win-vector.com/blog/wp-content/uploads/2009/12/ste2bimg43.png" alt="$ (b,a)$"> degrees of freedom.</dd>
</dl>
<p></p>
<hr />


<p>Related posts:<ol><li><a href='http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2a-%e2%80%99significant%e2%80%99-doesn%e2%80%99t-always-mean-%e2%80%99important%e2%80%99/' rel='bookmark' title='Permanent Link: Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’'>Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’</a></li>
<li><a href='http://www.win-vector.com/blog/2009/11/i-dont-think-that-means-what-you-think-it-means-statistics-to-english-translation-part-1-accuracy-measures/' rel='bookmark' title='Permanent Link: &#8220;I don&#8217;t think that means what you think it means;&#8221; Statistics to English Translation, Part 1: Accuracy Measures'>&#8220;I don&#8217;t think that means what you think it means;&#8221; Statistics to English Translation, Part 1: Accuracy Measures</a></li>
<li><a href='http://www.win-vector.com/blog/2010/02/living-in-a-lognormal-world/' rel='bookmark' title='Permanent Link: Living in A Lognormal World'>Living in A Lognormal World</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.win-vector.com/blog/2009/12/statistics-to-english-translation-part-2b-calculating-significance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
