What Makes Content Go Viral? Scraping Deadspin To Find Out
In order to drive clicks, A Random Walker clearly needs more exciting content than just scintillating R output and a third person bio section.
For clues on how best to attain said glorious traffic (along with an excuse to write a web scraper), I turned to sports blog Deadspin. Founded by Will Leitch in 2005 and driven by the tagline "Sports News without Access, Favor, or Discretion," Deadspin has thrived in a competitive (read: virtually impossible) media environment, leaving behind initial critical skepticism to become a publisher of highly-regarded longform posts.
To scrape Deadspin, I used RSelenium, an R package that gave slightly more flexibility than options like rvest. For a particular Deadspin link I returned an article's headline, writer, date published, page views, and tags. After running the scraper for a few hours (days) and doing some data cleanup, I ended up with metadata for 52,090 articles. Total page views summed to slightly under 2.7 billion; this presumably generates significant ad revenue even in a post-newspaper world. While the dataset is by no means the entire Deadspin corpus, it ought to be enough to get some "actionable insights." Finally, if you're interested in replicating/critiquing/giving feedback, all my code (and data!) can be found here.
As an initial look at the data, the chart below shows total articles scraped by publication month. The increasing trend is probably a mix of both Deadspin's growth and the scraper's propensity to find more recent links.
Enough data discovery. Time to learn how to generate buzz on the Internet!
Lesson 1: Viral Content Is Random
Well it was important not to get one's hopes up. Presenting Deadspin's most viewed post of all-time, with a healthy six million+ lead over second place:
It's a video of people failing to pour ice water over their heads.
Lesson 2: Viral Content Does Not Showcase The Best In Us
The effect of hashtagable #viral #content can be seen when looking at Deadspin's average page views per post by month:
It's clear that one or two viral articles can significantly boost the traffic of even a site Deadspin's size. Digging into the relevant dates:
October 2010: A month that lives in infamy. A.J. Daulerio posts details of Brett Favre's correspondence with Jenn Sterger. The world finds out more than it ever wanted to know about the gunslinger's extramarital affairs. This scoop felt exceptionally sleazy at the time- mainly because Sterger did not want it to run, leading Daulerio to source the info from a third party- and doesn't look much better now. Daulerio, of course, ultimately became part of an even bigger story.
August 2014: The aforementioned ice bucket anthology is posted, as is video of Tony Stewart hitting and killing Kevin Ward Jr. in a sprint car race. The legal ramifications from the crash are still ongoing. I think the ethics of posting a video of someone getting killed are pretty questionable, so as with Favre this does not stand out as a high point in Deadspin's history.
In general, the ice bucket challenge video differs from Deadspin's most viewed articles in that it is not very shocking. While some lighthearted articles achieve huge numbers, the majority are quite dark, or at least stunning in some sense.
Lesson 3: Hire Drew Magary
Lesson 4: Write About Politics
One of my reasons for scraping Deadspin was that the site does a nice job tagging every article, meaning that topics can easily be tracked over time.
To measure the popularity of tags I compared the page count of each article featuring a given tag to the median page count of all articles posted in the same month. This is a quick and dirty way to adjust for the upward trend in overall Deadspin traffic. Using median monthly traffic (plotted below) as a benchmark gives slightly more stability than using the average traffic numbers shown above. As a sidenote, I think there is evidence that Deadspin traffic has declined slightly since January 2016, but it's hard to quantify the decline from this dataset without spending more time figuring out how quickly articles amass most of their total pageviews (i.e. does it take a week, a month, or a year etc.)
The scatterplot below shows the top 40 tags by "average views above monthly median," limiting to only tags appearing in at least 100 articles:
For the uninitiated, "Balls Deep" is a common Drew Magary column pseudonym. Magary writes several regular features for Deadspin, spans a wide audience as an accomplished novelist and a correspondent for GQ, and has been labelled the most "actually-read" author on the internet. Finally, Drew is a former Chopped champion and a terrible dancer:
Looking further at the top tags, it's easy to see how and why Deadspin has broadened from a sports-only sports blog to a much more general purpose site. This shift accelerated with the arrival of several former Gawker writers, and is reflected in the prominence of the 'Election 2016', 'Donald Trump,' and 'Politics' topics. Apparently all websites are converging to an "all politics all the time" equilibrium.
Lesson 5: The World Is Not Ready For Your Pop Culture, Soccer, Tennis, or NHL Takes
On the other end of the spectrum, there are some obvious choices for what doesn't go viral:
It appears that the Deadspin readership is not prepared to share pop culture articles with their friends on Facebook, nor detailed soccer analysis (note that Screamer is Deadspin's soccer sub-blog). To see takes on dogs ranked so low indicates major societal issues beyond the scope of this article.
Conclusions
- Viral content is pretty random (duh) and likely depends mainly on what your extended family is willing to share on Facebook.
- Negative stories get clicks, from topics like racism and domestic violence to Duke basketball.
- Drew Magary writing about politics all the time makes sense.
Bonus Charts!
Here are all Deadspin writers with at least 500 articles found, plotted by the date ranges of their posts:
Takeaways here include that Barry Petchesky is an absolute machine and that Deadspin could stand to prioritize gender diversity.
Finally, there is evidence that Deadspin has grown up over time. I used the readability package in R to measure the grade level of Deadspin headlines by year and found somewhat of an increasing trend:
The methodology was successfully validated by checking the lowest ranked individual headline.