Last week, together with other members of the data science team at Cubeyou, we finalized a new version of our interest algorithm. The algorithm analyzes posts, likes, interactions and reactions of several million social media users across numerous pages of brands, media, organizations and famous people, and produces a distribution of interests for each user.
Every statistical method needs a validation procedure and our algorithm is no exception, so we thought: what better way than testing it against the outcome of the recent US presidential election?
We decided to investigate whether our new algorithm could succeed where all of the media, survey companies, polling organizations and other mainstream forecasting models had spectacularly failed: Could it have predicted Trump’s election?
After the fact, the media scrambled to find an explanation for how they could have been so wrong
“Every big-data, number-crunching Web site, from Nate Silver’s FiveThirtyEight to The New York Times’ Upshot; every poll, from Fox to Bloomberg to Rasmussen, had predictions that were so off that it now seems surreal.”
- Vanity Fair, “How the Polls Gave Us President Trump”
How and why did the polling system fail? That was the question. Veteran Democratic pollster, Geoff Garin, posited that surveys under-sampled Trump’s core audience of non-college-educated whites. Others believed “hidden Trump voters” wouldn’t admit, even in polls, that they intended to vote for Donald Trump. Polling, after all, depends on the idea that people will be honest with complete strangers as long as they are promised anonymity. Republican pollster Frank Luntz predicted in June that the vote would be determined by “none of the above” voters, who didn’t fit into categories polls typically use.
Jonathan Barnett, a Republican National Committeeman and Trump supporter told Politico.com “The pollsters have lost a lot of credibility and won’t be believed on anything soon...The way they poll doesn’t work anymore.”
A delayed prediction
In order to mimic the condition of a true prediction, we only considered the social media interactions up to the first week of November, when the political campaign officially ended. We then ran the algorithm and uploaded the results onto our platform, Cubeyou Plus, for a quick and insightful visualization.
We made two different searches about Donald Trump’s and Hillary Clinton’s audiences among those of legal voting age and looked at the location demographics of the results, isolating popularity of the two candidates state by state.
Popularity on Cubeyou Plus measures how popular a certain interest is among the chosen base, on a scale of 0 to 200, with 100 being averagely popular.
We picked the most popular candidate in each state and assigned the corresponding number of electors.
What we found was astonishing: Trump 313, Clinton 225.
Using social media data, together with our new algorithm, we correctly predicted the outcome of 44 states with 45 wrongly predicted electors out of the total 538.
Moreover, the popularity index difference in the states where our prediction failed was small and within the statistical error, as shown by the figures.
The ancient Greeks had a story about a prophetess, Cassandra, who was cursed to be able to see into the future, but be unable to change it because no one would believe her.
Unlike cursed Cassandra, we still have a lot to say about the results of 2016 elections. Our platform allows us to dig deeper than the mere voter count. Analyzing Trump’s and Clinton’s audiences, we can discover demographic, psychographic and interest correlations of both candidates’ voters. Valuable insights, not just to marketers, but to American politics itself.
For example, we found details like:
Donald is more popular among those whose household incomes fall below 40K; while Hillary was mainly supported by those who earned between 70K and 200K annually.
Yet, we observed that once over the threshold of 200K, Trump again had more consensus than his democratic opponent.
In terms of gender, we noticed a split that didn't surprise many. Trump’s unflattering comments about women during his campaign did not pay off among the female electorate, who preferred to support the first woman to run for president in US history.
What if it had been Trump v. Bernie?
We also had fun trying to predict what would have happened if the Democratic candidate to run against Trump had been Bernie Sanders.
Analyzing Bernie Sanders’ audience, we discovered that he seemed to gather more consensus than Hillary in almost all of the states where the former Secretary of State managed to win, and even snatch some important states such as Michigan and Wisconsin away from the Republicans. Our analysis, however, also revealed that the big swing states would still remain firmly in the hands of Donald Trump, and, even though it would have likely been by a small margin, Sanders wouldn’t have made it to the White House.
The final count for this prediction reads: Trump 293, Sanders 245.
We only set out to develop a superior algorithm for gaining even deeper and more accurate insights into the minds of consumers, but we seem to have inadvertently created a new way to poll in politics. At a time in which an effective political prediction mechanism is sorely needed. Pollsters and politicians take note - our insights team is waiting for your calls.