<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Metamarkets</title>
	<atom:link href="http://metamarkets.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://metamarkets.com</link>
	<description>Fast Insight for Big Data</description>
	<lastBuildDate>Fri, 27 Jan 2012 22:00:16 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
	<item>
		<title>Comment on Munging, Modeling and Visualizing Data with R by Ashwin Jayaprakash</title>
		<link>http://metamarkets.com/2012/munging-and-visualizing-data-with-r/#comment-1039</link>
		<dc:creator>Ashwin Jayaprakash</dc:creator>
		<pubDate>Fri, 27 Jan 2012 22:00:16 +0000</pubDate>
		<guid isPermaLink="false">http://metamarkets.com/?p=778#comment-1039</guid>
		<description>Thanks for sharing. This is a good summary, much better than what&#039;s here - http://www.r-bloggers.com/programmers-should-know-r/</description>
		<content:encoded><![CDATA[<p>Thanks for sharing. This is a good summary, much better than what's here - <a href="http://www.r-bloggers.com/programmers-should-know-r/" rel="nofollow">http://www.r-bloggers.com/programmers-should-know-r/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Munging, Modeling and Visualizing Data with R by Cotiso</title>
		<link>http://metamarkets.com/2012/munging-and-visualizing-data-with-r/#comment-1026</link>
		<dc:creator>Cotiso</dc:creator>
		<pubDate>Fri, 27 Jan 2012 17:34:15 +0000</pubDate>
		<guid isPermaLink="false">http://metamarkets.com/?p=778#comment-1026</guid>
		<description>Great stuff !
Tx,
Cotiso</description>
		<content:encoded><![CDATA[<p>Great stuff !<br />
Tx,<br />
Cotiso</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Druid, Part Deux: Three Principles for Fast, Distributed OLAP by Nicolas</title>
		<link>http://metamarkets.com/2011/druid-part-deux-three-principles-for-fast-distributed-olap/#comment-626</link>
		<dc:creator>Nicolas</dc:creator>
		<pubDate>Fri, 20 Jan 2012 19:49:51 +0000</pubDate>
		<guid isPermaLink="false">http://metamarketsgroup.com/blog/?p=382#comment-626</guid>
		<description>The pieces that you have put together in Druid look truly fantastic. Also looking forward to the open-sourcing of the code base, and also perhaps contributing to it!</description>
		<content:encoded><![CDATA[<p>The pieces that you have put together in Druid look truly fantastic. Also looking forward to the open-sourcing of the code base, and also perhaps contributing to it!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Beyond Hadoop:  Fast Queries from Big Data by Don MacLennan</title>
		<link>http://metamarkets.com/2011/hadoops-secret-shortcoming-speed-and-how-to-fix-it/#comment-80</link>
		<dc:creator>Don MacLennan</dc:creator>
		<pubDate>Fri, 16 Dec 2011 17:20:36 +0000</pubDate>
		<guid isPermaLink="false">http://metamarketsgroup.com/blog/?p=584#comment-80</guid>
		<description>Mike, nice post.  Similar to what we&#039;re experiencing in early days of our Big Data journey.  I wrote a post on the topic you hinted at in your last paragraph.  i&#039;m interested in your take: http://donmaclennan.com/2011/12/04/the-measurement-wars/</description>
		<content:encoded><![CDATA[<p>Mike, nice post.  Similar to what we're experiencing in early days of our Big Data journey.  I wrote a post on the topic you hinted at in your last paragraph.  i'm interested in your take: <a href="http://donmaclennan.com/2011/12/04/the-measurement-wars/" rel="nofollow">http://donmaclennan.com/2011/12/04/the-measurement-wars/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Beyond Hadoop:  Fast Queries from Big Data by Pascal Bensoussan</title>
		<link>http://metamarkets.com/2011/hadoops-secret-shortcoming-speed-and-how-to-fix-it/#comment-79</link>
		<dc:creator>Pascal Bensoussan</dc:creator>
		<pubDate>Mon, 07 Nov 2011 18:34:31 +0000</pubDate>
		<guid isPermaLink="false">http://metamarketsgroup.com/blog/?p=584#comment-79</guid>
		<description>You&#039;re not alone! Many thanks Mike for mentioning Aggregate Knowledge&#039;s blog post on &quot;Building a Big Analytics Infrastructure&quot;.</description>
		<content:encoded><![CDATA[<p>You're not alone! Many thanks Mike for mentioning Aggregate Knowledge's blog post on "Building a Big Analytics Infrastructure".</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hacking Hacker News Headlines by JR</title>
		<link>http://metamarkets.com/2011/hacking-hacker-news-headlines/#comment-74</link>
		<dc:creator>JR</dc:creator>
		<pubDate>Fri, 30 Sep 2011 09:36:53 +0000</pubDate>
		<guid isPermaLink="false">http://metamarketsgroup.com/blog/?p=256#comment-74</guid>
		<description>Actually bootstrap has a rather long history of being used to estimate coefficient standard error. There is a recent Casella paper:

http://ba.stat.cmu.edu/journal/2010/vol05/issue02/casella.pdf

but the technique dates back at least to Tibshirani&#039;s original Lasso paper. That said, for Lasso at least, if the true beta is zero then you can show bootstrap estimates are not consistent (somewhat intuitively, though unfortunate).

The bigger problem for us is that the bias introduced by regularization actually makes the CIs pretty meaningless anyway, since you can&#039;t measure the contribution of the bias to the stderr. There is a good stackexchange on this:

http://stats.stackexchange.com/questions/2121/how-can-i-estimate-coefficient-standard-errors-when-using-ridge-regression

So, as with any stats-meets-social-science result, take with a grain of salt.

--j</description>
		<content:encoded><![CDATA[<p>Actually bootstrap has a rather long history of being used to estimate coefficient standard error. There is a recent Casella paper:</p>
<p><a href="http://ba.stat.cmu.edu/journal/2010/vol05/issue02/casella.pdf" rel="nofollow">http://ba.stat.cmu.edu/journal/2010/vol05/issue02/casella.pdf</a></p>
<p>but the technique dates back at least to Tibshirani's original Lasso paper. That said, for Lasso at least, if the true beta is zero then you can show bootstrap estimates are not consistent (somewhat intuitively, though unfortunate).</p>
<p>The bigger problem for us is that the bias introduced by regularization actually makes the CIs pretty meaningless anyway, since you can't measure the contribution of the bias to the stderr. There is a good stackexchange on this:</p>
<p><a href="http://stats.stackexchange.com/questions/2121/how-can-i-estimate-coefficient-standard-errors-when-using-ridge-regression" rel="nofollow">http://stats.stackexchange.com/questions/2121/how-can-i-estimate-coefficient-standard-errors-when-using-ridge-regression</a></p>
<p>So, as with any stats-meets-social-science result, take with a grain of salt.</p>
<p>--j</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Rise of Interactive Data Visualization by Ramesh</title>
		<link>http://metamarkets.com/2011/the-rise-of-dynamic-data-visualization/#comment-78</link>
		<dc:creator>Ramesh</dc:creator>
		<pubDate>Tue, 20 Sep 2011 04:48:39 +0000</pubDate>
		<guid isPermaLink="false">http://metamarketsgroup.com/blog/?p=406#comment-78</guid>
		<description>An eye opener. I look at the visual medium and wonder what are we still doing in the world of grids and frames. Being mostly a data guy, I tried learning d3 to do something about it - seems like MetaMarkets as a company has more potential in doing the impossible. Keep going guys!</description>
		<content:encoded><![CDATA[<p>An eye opener. I look at the visual medium and wonder what are we still doing in the world of grids and frames. Being mostly a data guy, I tried learning d3 to do something about it - seems like MetaMarkets as a company has more potential in doing the impossible. Keep going guys!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Introducing Druid: Real-Time Analytics at a Billion Rows Per Second by Building a Big Analytics Infrastructure &#171; AK Tech Blog</title>
		<link>http://metamarkets.com/2011/druid-part-i-real-time-analytics-at-a-billion-rows-per-second/#comment-66</link>
		<dc:creator>Building a Big Analytics Infrastructure &#171; AK Tech Blog</dc:creator>
		<pubDate>Thu, 08 Sep 2011 23:37:29 +0000</pubDate>
		<guid isPermaLink="false">http://metamarketsgroup.com/blog/?p=189#comment-66</guid>
		<description>[...] problems. Some notable exceptions are Facebook, Twitter&#8217;s Rainbird and MetaMarket&#8217;s Druid. In this post we provide an overview of how we built Aggregate Knowledge&#8217;s &#8220;big [...]</description>
		<content:encoded><![CDATA[<p>[...] problems. Some notable exceptions are Facebook, Twitter&#8217;s Rainbird and MetaMarket&#8217;s Druid. In this post we provide an overview of how we built Aggregate Knowledge&#8217;s &#8220;big [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hacking Hacker News Headlines by anil patwardhan</title>
		<link>http://metamarkets.com/2011/hacking-hacker-news-headlines/#comment-73</link>
		<dc:creator>anil patwardhan</dc:creator>
		<pubDate>Thu, 08 Sep 2011 06:40:23 +0000</pubDate>
		<guid isPermaLink="false">http://metamarketsgroup.com/blog/?p=256#comment-73</guid>
		<description>20 Bootstrap replicates to estimate significance of covariates? Wrong use of the bootstrap, which is meant to give you a measure of confidence (intervals) of a single point estimate. Use a Liklihood ratio test with significance adjusted after permutation-based resampling.

in anycase 20 reps is way too few..

..also you can summarize your model more efficiently with AUC, rather than separate out sens/spec...since you not working with disease(e.g. cancer) I assume you have no tradeoff rationale for the importance of sensitivity vs. specificity...

I assume you are using Friedmans or [Tutz and Binders] Boosting algorithm...I would be interested in how it compared to the Lasso path solution...both would give you strong regularization? Strong regularization doesnt necessarily mean your getting rid of spurious features ...just that you are emphasizing parsimony...

impressive stuff on the data capture end...way out of my league...</description>
		<content:encoded><![CDATA[<p>20 Bootstrap replicates to estimate significance of covariates? Wrong use of the bootstrap, which is meant to give you a measure of confidence (intervals) of a single point estimate. Use a Liklihood ratio test with significance adjusted after permutation-based resampling.</p>
<p>in anycase 20 reps is way too few..</p>
<p>..also you can summarize your model more efficiently with AUC, rather than separate out sens/spec...since you not working with disease(e.g. cancer) I assume you have no tradeoff rationale for the importance of sensitivity vs. specificity...</p>
<p>I assume you are using Friedmans or [Tutz and Binders] Boosting algorithm...I would be interested in how it compared to the Lasso path solution...both would give you strong regularization? Strong regularization doesnt necessarily mean your getting rid of spurious features ...just that you are emphasizing parsimony...</p>
<p>impressive stuff on the data capture end...way out of my league...</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Hacking Hacker News Headlines by hyderali</title>
		<link>http://metamarkets.com/2011/hacking-hacker-news-headlines/#comment-72</link>
		<dc:creator>hyderali</dc:creator>
		<pubDate>Sat, 03 Sep 2011 10:01:53 +0000</pubDate>
		<guid isPermaLink="false">http://metamarketsgroup.com/blog/?p=256#comment-72</guid>
		<description>What is this n-gram? Care to explain in simple meaning.</description>
		<content:encoded><![CDATA[<p>What is this n-gram? Care to explain in simple meaning.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

