VOTING WITH YOUR TWEET: An experiment in political forecasting
Twitter is thought to be a wide-open medium for communication so as an experiment, we decided to see if vibrant, two-sided conversations are developing in congressional races.
Interestingly, the data suggests that communication is lopsided in favor of incumbents. Incumbents receive more tweets — often vastly more — than their challengers. This occurs about 80% of the time, in both Democratic and Republican districts.
The trend is illustrated in the charts below where the Tweet volume in Republican (red) and Democratic (blue) districts tend to diverge away from the black line — where dots would appear if both candidates in a district had the same number of Tweets.
Districts with no incumbent party (green dots) were most likely to be evenly split. These are usually new districts created by re-districting.
A classification experiment
But can we do better than this simple, if pretty accurate, rubric? For instance, we might ask whether, given the data we have, we could accurately predict the incumbent party in a district we’d never seen before, using only the Twitter volumes for the two candidates.
For this classification problem, we can use an algorithm (in this case a support vector machine) to build a map predicting where we think Republican and Democratic districts will appear. We can think of “training” the algorithm on 80% of the districts for which we have data, and then testing its accuracy on the remaining 20%. This approach helps guard against what statisticians call overfitting: generating a model that just reproduces the data it’s built on, rather than representing underlying trends in that data.
In the map that results, we see shaded areas reflecting what the algorithm thinks are “Republican” incumbent regions and “Democrat” incumbent regions. The difficulty of categorizing districts with no incumbent party is clear: we see no green regions where the algorithm expects no-incumbent districts to fall.
Now we can overlay the data to check the accuracy of our classification. We see fairly few misclassifications, i.e. red points in blue regions or vice-versa for districts with incumbents. The algorithm correctly classifies about 94 percent of those where Republicans are incumbents and about 80 percent of the seats where Democrats are incumbents.*
Regardless of how we model the data, though, incumbents routinely receive more attention from Twitter users than their challengers. Political scientists know that incumbents win re-election the majority of the time, suggesting that incumbency incurs some advantages.** Whether those advantages include more or better media coverage isn’t clear.†
Twitter is interesting since, unlike many media channels, it’s thought to be open to all entrants. In fact, discussions of the Tea Party emergence after 2008 sometimes credit Twitter and similar technologies for helping the Tea Party go around traditional Republican Party organizing channels.††
But our data suggest that most of the Twitter conversation reflects the broad advantages — whatever they may be — enjoyed by incumbents in the US electoral system.
- * The model made use of the svm implementation in the e1071 package for R, using a Radial Basis Function kernel. The model was tuned by training on a range of cost and gamma values using 10-fold cross-validation. The accuracy tests used the remaining 20% of held-out districts.
- ** Incumbency advantage is a long-studied problem. See this paper by Devin Caughey and Jasjeet Sekhon for a thorough discussion of what political scientists do know and don’t know about it.
- † Stephen Ansolabehere, Erik Snoberg, and James Snyder discuss media effects in this recent paper but find little evidence of systemic media advantages for the incumbent.
- †† See this paper by Vanessa Williamson, Theda Skocpol and John Coggin for more background on the role of Twitter in the initial development of the Tea Party.