One weekend a few months ago Vad [1] and I were hanging around the new Metamarkets office reading Hacker News. We noticed something strange: two different headlines, both linking to identical content, resulted in dramatically different popularity ranks. Do headlines matter so much? What drives observed popularity?
We started to investigate.

(Above: Rolling 10 days of article ranks. Click for an interactive version.)
The right way to answer this question was pretty obvious: crunch the data. We started scraping HN titles along with article ranks and fed the resulting data into our online feature learning stack.
Below is the distilled summary of the result, our “Top Ten Hacker News Headline Hacks,” including feature weight, standard error and p-value vs. zero. Positive weight means the feature is predictive of high article rank.
Hack #1: Maximize Controversy
1.4 ± 0.5 [p<1e-5] | essential
1.3 ± 0.5 [p<1e-5] could
1.2 ± 0.4 [p<1e-5] problem
1.3 ± 0.8 [p<1e-5] survived the
1.0 ± 0.5 [p<1e-5] controversy
0.9 ± 0.3 [p<1e-5] impossible
Hack #2: Question Authority
0.7 ± 0.2 [p<1e-5] why ____ future
0.4 ± 1.0 [p=0.2] the ____ behind
0.2 ± 0.3 [p=0.04] why don’t
0.1 ± 0.3 [p=0.06] | lessons
Hack #3: Avoid False Promises
-1.5 ± 0.8 [p<1e-5] tricks
-0.7 ± 0.5 [p<1e-5] the world |
-0.7 ± 0.2 [p<1e-5] the greatest
-0.6 ± 0.3 [p<1e-5] awesome
-0.6 ± 0.7 [p=0.003] anatomy of a
-0.5 ± 0.3 [p<1e-5] guide to
Hack #4: Short is Sweet
-0.3 ± 0.04 [p<1e-5] {# WORDS}
Hack #5: Execution not Ideas
2.6 ± 2.1 [p<1e-5] showing
1.5 ± 0.7 [p<1e-5] | building
0.6 ± 0.3 [p<1e-5] makes
0.5 ± 0.4 [p<1e-5] starting a company
0.3 ± 0.3 [p<1e-3] join a startup-1.1 ± 0.3 [p<1e-5] ideas
-1.1 ± 0.3 [p<1e-5] idea?
Hack #6: Everybody Loves a Winner
1.7 ± 0.7 [p<1e-5] | ____ acquires
0.5 ± 0.3 [<1e-5] hire
0.4 ± 0.7 [p=0.02] worth
Hack #7: Everybody Loves Data
1.9 ± 1.8 [p<1e-4] data |
0.6 ± 0.8 [p=0.004] data -
0.5 ± 0.1 [p<1e-5] visualize data in-1.3 ± 0.7 [p<1e-5] algorithm
Hack #8: Nobody Cares About You
-0.2 ± 0.3 [p=0.008] my startup
-0.9 ± 0.2 [p<1e-5] silicon valley
Hack #9: Some Topics are Just Miserable
-0.4 ± 0.3 [p<1e-5] angry birds
-0.2 ± 0.1 [p<1e-5] harry potter
-0.5 ± 0.4 [<1e-4] taxes
-1.5 ± 1.0 [<1e-5] downtime
Hack #10: Social is For Losers
-0.6 ± 0.9 [p=0.007] social
-0.5 ± 0.4 [p<1e-4] gamification
-0.3 ± 0.6 [p=0.04] twitter |
-2.4 ± 1.5 [p<1e-5] airbnb
Standard disclaimer: the above coefficients are provided for entertainment purposes only. Feature interactions in text are a bitch. Correlation does not imply causation. Past performance does not guarantee future success.
How We Did It
We extracted n-gram (e.g. “Harry Potter”, “Google”, “Silicon Valley”) and skip features (e.g., “a ____ for”, ”| ____ acquires”) for each title, including start- and end-of-sentence markers and optionally punctuation. For learning we used boosted stochastic gradient descent with logistic loss [2], predicting whether the article made it to the top 20 or not during its observed lifetime. Strong regularization was used to eliminate spurious features, and twenty bootstrap replicates were used to measure significance of coefficients and classification accuracy.
For this untuned, first-pass model, we achieved 64% classification accuracy on a hold out set over the past two months. Positive predictive value was 25.7%, negative predictive value was 73.1%, sensitivity was 18.2% and specificity was 80.9% [3]. Despite this weak predictive power, we found some interesting correlations, more of which we’ll release as the model improves.
[1] Of Koalas to the Max fame.
[2] Think: wabbit style.
[3] Predictive diagnostics.
8 Comments on “Hacking Hacker News Headlines”
Frank | androidnews said on May 5, 2011:
Can you do that with reddit, too?
ash said on May 5, 2011:
My next blog entry will be "ESSENTIAL LESSONS SHOWING DATA WORTH IN THE FUTURE"
Josh Powell said on May 5, 2011:
So, the best possible title is?
Google acquires essential data showing problem, could survive.
MyGradThesis said on May 16, 2011:
Maybe it's the late hour, but it looks like your n-gram analysis didn't get much of a weighting problem at all (from the limited grams). I am very, very jealous. Could it be that the sentences you were working with did not have much variety (i.e. Google was always a proper noun, not a verb, or the punch line to a joke). I'm trying to baseline Twitter data with an n-gram approach (to show how great my DAN2 approach is) but I find the grams are not predictive until they get to 2 or 3 words in a row. What is your special sauce in gram selection (or is the limited sentence variety the key)? Thanks, love your site!
hyderali said on September 3, 2011:
What is this n-gram? Care to explain in simple meaning.
anil patwardhan said on September 8, 2011:
20 Bootstrap replicates to estimate significance of covariates? Wrong use of the bootstrap, which is meant to give you a measure of confidence (intervals) of a single point estimate. Use a Liklihood ratio test with significance adjusted after permutation-based resampling.
in anycase 20 reps is way too few..
..also you can summarize your model more efficiently with AUC, rather than separate out sens/spec...since you not working with disease(e.g. cancer) I assume you have no tradeoff rationale for the importance of sensitivity vs. specificity...
I assume you are using Friedmans or [Tutz and Binders] Boosting algorithm...I would be interested in how it compared to the Lasso path solution...both would give you strong regularization? Strong regularization doesnt necessarily mean your getting rid of spurious features ...just that you are emphasizing parsimony...
impressive stuff on the data capture end...way out of my league...
JR said on September 30, 2011:
Actually bootstrap has a rather long history of being used to estimate coefficient standard error. There is a recent Casella paper:
http://ba.stat.cmu.edu/journal/2010/vol05/issue02/casella.pdf
but the technique dates back at least to Tibshirani's original Lasso paper. That said, for Lasso at least, if the true beta is zero then you can show bootstrap estimates are not consistent (somewhat intuitively, though unfortunate).
The bigger problem for us is that the bias introduced by regularization actually makes the CIs pretty meaningless anyway, since you can't measure the contribution of the bias to the stderr. There is a good stackexchange on this:
http://stats.stackexchange.com/questions/2121/how-can-i-estimate-coefficient-standard-errors-when-using-ridge-regression
So, as with any stats-meets-social-science result, take with a grain of salt.
--j
remote administration tool said on March 5, 2012:
Wonderful post, very informative. I'm wondering why the other experts of this sector don't realize this. You should continue your writing. I am confident, you have a great readers' base already!|What's Taking place i'm new to this, I stumbled upon this I have found It positively helpful and it has aided me out loads. I'm hoping to contribute & assist other customers like its helped me. Great job.
Leave a Comment