<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: A Demonstration of Data Mining</title>
	<atom:link href="http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=a-demonstration-of-data-mining</link>
	<description>The Applied Theorist&#039;s Point of View</description>
	<lastBuildDate>Thu, 15 Jul 2010 00:13:50 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: Mei Marker</title>
		<link>http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/comment-page-1/#comment-786</link>
		<dc:creator>Mei Marker</dc:creator>
		<pubDate>Fri, 18 Sep 2009 23:56:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=252#comment-786</guid>
		<description>Hi John,

Nice article and educational for me!

My 2 cents: I think the full Bayes Model and Kernelized regression result actually makes more sense than &quot;monotone assumption&quot;. The reason is that when the match factor is high, a high discount factor may not be good because when people find exactly what they are looking for, and you sell it too cheap, it may actually turn people away.   So upper right corner labeled red actually could be the right thing to do. On top of that, I also favor the underlying assumptions/theories behind full Bayes and Kernel methods. 

Regards,
Mei</description>
		<content:encoded><![CDATA[<p>Hi John,</p>
<p>Nice article and educational for me!</p>
<p>My 2 cents: I think the full Bayes Model and Kernelized regression result actually makes more sense than &#8220;monotone assumption&#8221;. The reason is that when the match factor is high, a high discount factor may not be good because when people find exactly what they are looking for, and you sell it too cheap, it may actually turn people away.   So upper right corner labeled red actually could be the right thing to do. On top of that, I also favor the underlying assumptions/theories behind full Bayes and Kernel methods. </p>
<p>Regards,<br />
Mei</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jmount</title>
		<link>http://www.win-vector.com/blog/2009/08/a-demonstration-of-data-mining/comment-page-1/#comment-500</link>
		<dc:creator>jmount</dc:creator>
		<pubDate>Mon, 24 Aug 2009 00:12:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.win-vector.com/blog/?p=252#comment-500</guid>
		<description>One more graph.  


Here we again show the linear discriminant line (solid) we talked about throughout the paper and a new line (dashed) that represents a &quot;best probability&quot; fit.

&lt;img src=&quot;http://www.win-vector.com/blog/wp-content/uploads/2009/08/massSoln1.png&quot; alt=&quot;massSoln.png&quot; border=&quot;0&quot; width=&quot;302&quot; height=&quot;302&quot; /&gt;

The new dashed separating line is a cut that maximizes the probability of getting an example point correct (points drawn with odds proportional to the odds described by our fit distributions).  This alternate fit is gotten by switching to a &quot;mass based metric&quot; (or a L0 metric) and is similar in spirit to the ideas in Quantile Regression and Support Vector Machines (which use a L1 metric instead of a L2 metric or the more common plausibility measures).


You can see how the new dashed fit line is more compatible with the shape  of the lower red region (the negative examples) than the original solid fit line.  If you have a good idea what you want you can actually choose a fit line the respects one shape more or the other (depending on what is important in your application).


There will probably be Win-Vector article at some time on the methods to find this fit.</description>
		<content:encoded><![CDATA[<p>One more graph.  </p>
<p>Here we again show the linear discriminant line (solid) we talked about throughout the paper and a new line (dashed) that represents a &#8220;best probability&#8221; fit.</p>
<p><img src="http://www.win-vector.com/blog/wp-content/uploads/2009/08/massSoln1.png" alt="massSoln.png" border="0" width="302" height="302" /></p>
<p>The new dashed separating line is a cut that maximizes the probability of getting an example point correct (points drawn with odds proportional to the odds described by our fit distributions).  This alternate fit is gotten by switching to a &#8220;mass based metric&#8221; (or a L0 metric) and is similar in spirit to the ideas in Quantile Regression and Support Vector Machines (which use a L1 metric instead of a L2 metric or the more common plausibility measures).</p>
<p>You can see how the new dashed fit line is more compatible with the shape  of the lower red region (the negative examples) than the original solid fit line.  If you have a good idea what you want you can actually choose a fit line the respects one shape more or the other (depending on what is important in your application).</p>
<p>There will probably be Win-Vector article at some time on the methods to find this fit.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
