Guess The Stock!

November 21, 2017 by Adam Walker

Latest R Shiny project...

Bill Simmons Doesn't Talk To Women Much

October 22, 2017 by Adam Walker

Quick post this week: taking a look at who's appeared on The Bill Simmons Podcast, a perennial top five sports pod (and one of my personal favorites). Data is from Simmons's Ringer era only - I didn't go back through the ESPN/Grantland days.

Sidebar: This post is quite critical of The Sports Guy. However, I read Simmons's columns religiously from 2005 to about 2015 or so. He's undoubtedly one of the most influential sportswriters of the last two decades. Furthermore, the 30 for 30 series was a phenomenal accomplishment, and despite his endless Celtics homerism Simmons stands as a legitimate NBA historian. While everyone has one now, podcasts were not always an obvious choice - the early episodes of "Eye Of The Sports Guy" could be a tough hang - yet Simmons recognized the format's potential as a compelling alternative to sports radio. Simmons's defining ESPN project, Grantland, was awesome, both in terms of content and the careers it helped launch, and more recently The Ringer has begun to carve its own niche too. While his shtick has perhaps grown stale, Bill Simmons has still had an amazing career.

First, here are all the guests to appear at least 3 times over the pod's 275+ episode run. Joe House leads the pack, reflecting a large role in the early post-Grantland pods, as well as his continued presence for discussions on the NBA, D.C., and America's culinary specialties. Proud gambling degenerate Cousin Sal is close behind, then there's a big gap until NFL commentator Mike Lombardi, resident Yankee fan (and Republican) JackO, plus many other familiar Simmons collaborators.

While it's no surprise to anyone who follows Simmons that buddies like House and JackO are frequently heard from, it is a little surprising how few women have appeared...

Out of 432 guest appearances across the 275 episodes tracked, only 18 (4.17%) were made by women. Nine women total have appeared compared to 135 men. Mallory Rubin has been on six times, Juliet Litman four, Sarah Tiana twice, then a single spot for each of Sally Jenkins, Abby Wambach, Charlize Theron, Diana Taurasi, Katie Baker, and Katie Nolan. Mike Lombardi has made more appearances than this entire list combined.

So why does Simmons have so few women on the pod? One reason would be that this is a sports podcast featuring athlete interviews, and the majority of famous athletes in America are men. However, loyal listeners know that Simmons doesn't actually talk to that many current players or coaches (the many Kevin Durant appearances aside).

What does Simmons talk about? Well, looking at the words most frequently appearing together in episode descriptions, it appears Simmons is pretty locked onto the NFL and NBA:

Screen Shot 2017-10-22 at 5.01.06 PM.png

This makes sense from both the perspective of Simmons's personal interests (Celtics! Patriots!), along with the goal of maximizing podcast downloads: the NFL is America's most popular sport, and the NBA appeals to a younger crowd more likely to be into pods.

The download angle is worth considering. It's well-established that The Ringer mainly monetizes from podcast ad revenue. It would be interesting then to know if Simmons has found that episodes with female headliners (e.g. Abby Wambach, Diana Taurasi, Charlize Theron) perform worse. This would be disappointing, but would at least provide an economic basis for how Simmons chooses his guests.

I would guess there are a couple of reasons for Simmons's guest selection. First, Simmons likes talking to his old college buddies, and his old college buddies are men. Second, by talking to people he already knows pretty well, Simmons reduces prep time for podcasts (allowing more episodes per week) while also generating stable download numbers due to listener familiarity.

Still, for someone who has been fine featuring obscure male guests (Jason Stein anyone?), it would be nice to see a bit more diversity in the roster.

Creation Stories: Analyzing YouTube Data With R

October 10, 2017 by Adam Walker

Earlier this year, YouTube introduced Creators On The Rise, a showcase for up and coming stars on the platform. For those of us with somewhat modest YouTube followings, these creators are VERY intriguing due to their rapid growth on generally low budgets. How frequently do creators post videos, and when? Which strategies do they implement to create appealing thumbnails and titles? And most importantly, how on earth do these channels fill hours and hours of content each month? It's a bit of a puzzle.

To analyze the traits of these new media success stories I turned to tuber, a handy R package that faciliates access to YouTube's API. The packagae contains several easy functions to grab data on paticular channels or videos. Just make sure to register via Google's developer console to get the necessary API ID and key.

With a little bit of wrangling I compiled a dataset of 92 channels previously featured as Creators On The Rise. Variables available included subscriber counts, likes, comments, plus metadata on 9,410 published channel videos. All data + code is available on my Github. ADD LINK

Screen Shot 2017-10-10 at 3.19.04 PM.png — "With more than 1,000 creators crossing the 1,000 subscriber threshold every single day, new talent is constantly emerging."

YouTube introductory post

1. How Long Till I Become A YouTube Superstar?

The 92 creators represent an interesting mix of relatively fast success versus long-term grind. Channels like Deestroying and Hailey Reese amass subcribers with abandon, while 007craft and Amazing Grays take a more leisurely approach.

Screen Shot 2017-10-10 at 7.49.22 PM.png

YouTube's selection criteria for a "Creator On The Rise" likely means this chart represents an overly optimistic view of the time it takes to become a YouTube superstar. Plus it's worth noting that some creators may have had an existing following prior to creating their channel - Sophina The Diva, for example, has garnered 64K subscribers in a little over a month, but had the advantage of already being rather famous. Nevertheless, building a YouTube empire does not have to be a decades-long affair.

2. But Do I Need A Viral Hit?

It doesn't hurt. Let's return to the previously mentioned 007craft. 007craft (I'm not sure anyone knows his real name) attracted attention earlier this year as the guy who was living in a storage locker. His video on the experience has collected over 3.5M views, representing an outsized portion of his channel's 11.8M total.

Of course, creating a viral video is tough. 55 videos in the data have broken 1 million views, representing about 0.6% of the total. Still, to have a shot you probably want to keep your clip under 15-20 minutes:

Screen Shot 2017-10-10 at 4.21.32 PM.png

"FEEDING THE DEVIL | Spiders and Centipede" can be found here for those interested. I thought insect-averse readers would be glad to avoid checking out the thumbnail, so here is the second place video instead:

Subsetting the data to only the highest viewed video from each channel, we find the 92 videos have a median view count of 553K. This is certainly a lofty bar, but your channel doesn't have a Charlie bit my finger level of virality to add subs.

3. What Should I Make My Videos About?

All the channels use tags to help people find their content. Parsing individual tags and looking for positive correlations (i.e. a given pair of tags often appear in the same video), we can get a general idea of the kinds of topics creators are discussing:

Screen Shot 2017-10-10 at 3.03.45 PM.png

Most of the content is fairly down to earth. There are fitness channels, family-oriented affairs, make-up tutorials and more. And very little politics! Perhaps YouTube is consciously refraining from naming political channels to the "On The Rise" section? Or maybe political takes aren't all that compelling after all...

4. How Much Content Does A Top Content Creator Create?

You might wonder if it's possible to build a large following without quitting your day job? Maybe! Most creators are posting steadily but certainly not daily. The data below shows publication frequency over the past 52 weeks for all the channels created prior to the start of the period:

Screen Shot 2017-10-10 at 2.28.59 PM.png

CatPusic might be my favorite of all the channels in the data.

In terms of timing, Friday is the most popular day of the week for publishing new videos:

Screen Shot 2017-10-10 at 2.37.12 PM.png

5. The Rich Life...What's That?

The attentive reader will have noticed an outlier in the second to last chart. The Rich Life follows a "homeschooling family of 7 that loves to share the good, the bad and maybe the occasional ugly." Of course, the idea of family as #brand is definitely nothing new, but 200K subs in a little over a year is impressive regardless.

Beyond its outstanding publication frequency, the channel's clip game is on point:

Screen Shot 2017-10-10 at 12.43.39 AM.png

The titles alone entice you to click and find out what hilarous shenanigan have occurred, and the thumbnails are attention-grabbing and varied. Plus the sheer variety of hijinks is impressive! If you have recently been kicked in the head by a horse, suffered a break-in, evaded a tornado, and dealt with a police run-in...you may have what it takes to be a top YouTube creator.

6. Final Thoughts

Overall the data available from the YouTube API is impressive

Lots more to do on this topic:
Can model out the views
Analyze the video titles and descriptions
Look at the thumbnails
Dataset/code is here….

etc.

Presidential Approval Ratings Don't Mean Much Early On

August 17, 2017 by Adam Walker

Trump is having a lousy week. You've probably heard. The President's Charlottesville response was deemed inadequate, his lawyer has been forwarding misguided General Lee comparisons, and now we've learned that a crown jewel of Presidential gatherings - the lauded Manufacturing Council - is no more. Who knew condemning Neo-Nazis could be so hard?

The recent stream of negative press hasn't helped the President's popularity. FiveThirtyEight has Trump's approval rating at a lowly 37.3%. The site's nifty historical comparisons also show that, among post-WW2 Presidents, only Gerald Ford had a similarly poor approval rating at this point in his administration. Trump's North Korea Twitter rhetoric certainly doesn't alleviate concerns about executive or national stability either.

However, while Trump's low rating is unusual given how early we are in his administration, it is not really an outlier in the context of overall Presidential approval:

Many Presidents have, for instance, dipped below 40%. Precipitous drops in popularity are the norm rather than the exception - although they do tend to occur in a President's second term. Trump's fall is particularly jarring compared to Barack Obama's comparatively serene eight years, an historic outlier that featured no true "bottoming out" a la Nixon, Carter, or either Bush.

Still, does Trump's miserable approval rating mean he won't be re-elected? To investigate, I plotted the approval of first term Presidents by the number of days until their re-election bid, splitting out winners and losers. The 1964 election run-up is shown for LBJ.

It's clear that incumbents usually win. Only Ford, Carter, and George H.W. Bush lost their re-election bids. Voters generally prefer the devil they know.

The other noteworthy trend is that early term Presidential approval is not a good predictor of election success. Clinton was exceptionally unpopular at times. And Truman's swings over the course of his first term make Trump's administration look positively serene. Trump is certainly in bad shape, but it's not inconceivable he turns it around.

Of course, I do realize a President must actually make it to the election to be re-elected.

P.S. It is worth mentioning that approval rating has some severe flaws: first, the question is rather vague - why not ask a respondent directly about the President's effect on their quality of life? Second, military action tends to give administrations a nice boost (at first). This is a sub-optimal incentive structure. The metric reminds me of batting averages in baseball: no student of the game takes them seriously, but since even the layperson knows the definition it is unlikely that Mendoza line mentions ever fall completely out of favor.

If you enjoyed this post, consider signing up for my weekly newsletter on tech, sports, and more.

Text Mining BBC Headlines with R

July 30, 2017 by Adam Walker

Recently I discovered Text Mining with R: A Tidy Approach, a new guide by Julia Silge and David Robinson that synthesizes common text analysis tasks with the tidyverse concepts familiar to all Hadley Wickham adherents.

To test out the book's techniques, I scraped BBC headlines since 2014 using the Wayback Machine. After the usual data wrestling/wrangling process, I was left with a de-duped dataset of 2,885 headlines that had appeared in the BBC's top headline slot. From here, a simple application of R's unnest_tokens gave me an appropriately "tidy" dataset of one word per row. I did concatenate some obvious bi-grams (e.g. North Korea) but otherwise stuck with individual words.

Jumping into some analysis, I leveraged Text Mining's code to summarize word frequencies. Here, we find some predictable terms in the top spots:

Headline Changes

While not surprising that the U.S. and Trump have dominated headlines, it is interesting to examine how the focus of BBC coverage has shifted over the years. One approach suggested in Text Mining is to calculate a document's tf-idf score, where tf-idf refers to "term frequency–inverse document frequency." The goal is to find words that occur frequently in a particular document (e.g. all the headlines for a given year) but are not terribly common across an entire corpus (e.g. all headlines from 2014-2017). Applying tf-idf by year, we find that Gaza stories were prominent in 2014, the Greece debt crisis was on everybody's mind in 2015, while from 2016 onwards we have been living in TrumpWorld:

There are other options beyond tf-idf. In particular, one of Text Mining's case studies demonstrates a model-based technique using Twitter archives. Here, separate GLM binomial models are fit to each word's count vs. total word frequency across time. A positive slope for a given word's model indicates the word is appearing more often across time, while a negative slope demonstrates reduced frequency. Given the high volume of models, significance is assessed with adjusted p-values to avoid multiple comparison issues. I used a .01 adjusted p-value threshold to assess significance (along with a minimum of 50 total appearances). Below are the frequencies across time for all the words found to have significant slopes:

Unsurprisingly, words like Trump and Gaza appear again, but the GLM approach also identifies "Ukraine" as a signficant decliner - a word missed using tf-idf scores.

You might be wondering what the "opportunity cost" has been of all the U.S. election/Trump stories from the past 1.5 years? Using the BBC's regional classification for stories (parsed from headline URLs), we can see that stories from Europe, the Middle-East, and Africa have taken the bulk of the reduced press coverage since 2016:

Sentiment Analysis

Switching into a different topic, Text Mining has a nice sentiment analysis section that demonstrates usage of the sentiment dictionaries found in the book's associated tidytext package. Below I break-out BBC headline word frequencies by positive and negative categories, as found in the included "bing" lexicon.

Note that using these sentiment dictionaries does require some care. For example, "trump" was listed as a positive word. This may or may not ring true depending on the reader's political persuasion, but for the purpose of analytic objectivity I thought it best to remove the word from either category.

In any case, we find that BBC headline coverage is dominated by one-off destructive events, from "attack" and "crash" to "strike" and "bomb." Terror is still very effective at garnering media coverage.

Screen Shot 2017-07-30 at 10.53.19 AM.png

Network Effects

I'd like to close with my favorite visual from Text Mining: word networks using igraph and ggraph. The network below visualizes connections between words appearing in the same headline (minimum 8 matches). Here, we can see the vast quantity of BBC coverage devoted to U.S. foreign policy, from Ukraine and Syria to interactions with Russia, Iran, and China. On the edges are some anomalous stories, such as the search for MH370 plus reports on Israel and the Gaza Strip.

Find all code for this analysis here.

Enjoy this post? Subscribe to my weekly newsletter on tech, sports, and more!

Book Review: "Rediscovering Americanism: And the Tyranny of Progressivism"

July 14, 2017 by Adam Walker

"Rediscovering Americanism: And the Tyranny of Progressivism" is the latest book by Mark R. Levin: author, radio host, lawyer, and former Reagan administration offical. It is a current #1 New York Times bestseller.

Mark R. Levin is not happy. Core American values installed by the Founding Fathers are under threat. Belief in an immutable natural law is being swept aside for transient moral relativism, while trust in market capitalist economics erodes via calls for more and more government influence. The future is dim.

The cause of this upheaval is, as you may have guessed, progressivism. Levin depicts a progressive movement lusting for a powerful, centralized administrative government, to be run by self-interested (presumably coastal) elites. To advance their pernicious agenda, progressives fight on multiple flanks. First, Bernie-esque populism riles up a previously dormant base. Second, pseudo-experts continually emphasize the “complexity” of national affairs, conveniently necessitating a larger bureaucratic presence. Like many conservatives, Levin sees massive entitlement programs as bloated evidence of progressive ideological failing. He goes further, though, dismissing dire climate change forecasts as fear-mongering used to justify government intrusion on individual rights. One gets the sense that Levin’s argument would have had slightly more force had Hillary won in November - as he perhaps expected when writing - but no matter.

Despite this striking thesis, the majority of Levin's book actually reads like Political Philosophy 101. He takes us on a whirlwind tour of the thinkers that inspired the Founding Fathers, from Locke to Montesquieu. It is fascinating to read Thomas Jefferson's thoughts on the definition of liberty, regardless of one’s political persuasion, and Levin vividly depicts the dire economic conditions faced by the founders prior to the Constitutional Convention. While disagreeing with its conclusions, Levin nevertheless traces the roots of modern progressive thought in great detail, from Rousseau and Hegel to Herbert Croly and John Dewey. Oh, and Karl Marx appears of course. These individuals are not so much quoted as entirely excerpted. Verbatim quotes often run to several pages, and after reading such copious amounts of political philosophy the return of Levin's own prose can seem like merciful relief. The philosophers probably deserve a co-author credit.

Levin's argument is certainly not without merits. With an assist from Milton Friedman, Levin compellingly describes the tremendous benefits generated (at least in aggregate) from mostly unfettered market capitalism. And Levin makes a strong case that the Constitution's Commerce Clause has been extended far beyond what the Founders would have deemed acceptable. The author points out that FDR’s expansionary New Deal probably wasn't all that great. And pretty much everyone can agree that government programs waste a lot of money. Finally, Levin's takedown of Karl Marx's muddled ideas on the proletariat and the bourgeois is the philosophical equivalent of shooting fish in a barrel; we all know by now that communism doesn't scale.

What these scattered points fail to do, however, is convince the reader that the tyrannical progressive boogeyman is actually around the corner. After all, despite a fair number of Presidents who could be considered progressive, the Constitution’s separation of powers provisions are largely intact. Yes, too many executive orders are now passed, but nobody who has seen Trump's travel ban blocked by the courts or his attempts at healthcare reform stall in Congress could agree that we have reached central government tyranny just yet. Constitutional protections included in the Bill of Rights are supported by both sides; we simply differ on the details, particularly with regards to the Second Amendment.

Levin’s writing also contains a whiff of hypocrisy. When quoting Friedman, Levin admits that “big government” projects such as highways and dams have led to clear benefits. And yet highway construction involved a mass confiscation of private property clearly inconsistent with Levin’s stated principles. It seems like Levin is doing some sort of implicit ROI calculation to despise Social Security yet accept America’s excellent roads. In addition, Levin excoriates progressives for pushing a smug “we know best” attitude to policy choices, yet confidently declares that popular entitlement programs will end with a “devastating collapse.” Maybe so, but it hasn’t happened yet, and to call for the end of welfare supported by so many reeks of the elitist attitude that Levin rails against. Furthermore, how many of us really want to return to an era before entitlement programs, before fair labor laws, and before civil rights? Levin seems unaware that his views so frequently leave him on the wrong side of history.

Rediscovering Americanism’s strident climate change denials are also instructive. Levin clearly recognizes that climate change is a major threat to his pro-individual, anti-centralized government worldview, in that it is the ultimate tragedy of the commons problem. Most observers agree that government intervention is required. As sea levels rise, no private company will have the incentive or ability to keep Florida above water (assuming it will even be possible at all). But this justification of aggressive government action is intolerable to Levin, so he has no choice but to attack climate science viciously.

Still, Levin's arguments should not be completely dismissed, in particular his defense of individual rights and free market economics. It will always be important to consider the acceptable limits of government authority. Creeping executive power should be repulsed, from Obama’s drone strikes to Trump’s immigration ban. And as the left becomes more enamored with ideas like a universal basic income, it will be important to consider the effect an increased welfare state will have on the entrepreneurial spirit that has made America an innovation leader.

But Levin does not flesh this thesis out fully. He fails to reconcile his view that individuals can decide for themselves with the reality that most of these people support broad entitlement programs. The book ultimately presents a false choice between a return to the 1700’s and an all-encompassing Marxist super-state. Any reasonable observer can see that there is a middle path; one where government levies only moderate taxes to aid entrepreneurism while also providing some reasonable level of healthcare to all. Just as Hamilton and Jefferson compromised, conservatives and progressives can too.

Enjoy this post? Sign-up for my weekly newsletter!