Trump victory: Slap in the face for big data or desperate cry for more?

How have election pollsters proved so spectacularly wrong again?

Stats are the lifeblood of everything, we’re told. Analytics are the new creed, we’re all meant to believe in (and even semi-worship). But overwhelming evidence shows they’re not always that trustworthy. And the fundamental kicker might have come this morning when all those big data predictions on the US election proved so dramatically wrong. Trump didn’t just edge past the pundits to US presidency… his victory could be described as a landslide.

Political predictions have been quite wrong for some time now. SurveyMonkey was the only one which called the 2015 UK election correctly. But it still got Brexit wrong and – despite polling around 28,000 individuals a week over the last month or so – the US election too. In fact, the only one to get it right this time was the USC/L.A. Times Daybreak tracking poll, which predicted a small lead for Trump (if nothing more).


Why were so many pollsters wrong?

A very decent analysis in the L.A Times raised a couple of interesting points around why the polls might have been off. The first came from the Trump supporters surveyed, who said that while they were slightly more comfortable talking to family members and acquaintances about their choice of candidate, they were notably less comfortable sharing this information with telephone pollsters. This fits with the much touted idea that non-vigilante Trump fans were “ashamed” to broadcast their views in advance of the election.

On a semi-related note GQ Magazine offered the opinion that “many Americans weren’t ready to vote for a female president, and that their discomfort was something they weren’t prepared to admit to others or to themselves until they reached the voting booth.”

But, this aside, the second interesting thing tackled by the L.A Times was the fundamental notion of statistical weighting itself. Because however big the data set, it always requires some form of modelling, and it is pretty impossible to tell whether that imposed value judgement is actually correct until Election Day itself.

As Matt Jones, Analytics Strategist at data science consultancy, Tessella tells IDG Connect: “Traditional statistical analysis of polling data and surveys will only be representative of those that bothered to take part, and that section of the voting population is not representative.”

This is still only a fraction of a more nuanced picture though. As Jonathan Forbes, chief architect, analytics at Aquila Insight puts: “I think it’s a slightly incorrect premise to propose that the pundits are getting this so wrong. There’s always scope for an underdog to pull off a victory, and I’m not aware of a poll that had Donald Trump with 0% chance of winning.

“Something as big as an election will always be open to be data sourcing issues,” he adds.

However, he concedes “there are a couple of key issues with traditional predictive methodologies”. These are firstly, that prediction relies on what happened in the past being reasonably similar to what will happen in the future. And secondly, that most common traditional methodologies don't really get under the skin of what motivates voters.

“It's a bit like trying to forecast sales by asking customers whether they'll buy something - they'll tell you what they should do, but their actual behaviour will be dictated more by the weather, the money in their pocket and what they fancy for their tea,” he explains.


So, can we address these glaring pollster problems?

Forbes believes both issues can be addressed “by taking a more granular, drivers-based approach,” but warns that this takes time and money. “And the media cycle won't allow for either.”

Jones of Tessella, on the other hand, suggests the solution might be even bigger data. He feels this could be achieved through the application of machine learning to social media streams.

“A human review of exceptional events will be required to inform an accurate weighting model to increase confidence in prediction accuracy,” he adds.

“In general terms AI, machine learning and predictive analytics perform best at analysing large data sets and appoint relationships and correlations that may not be otherwise recognised. What they cannot do, with a great level of accuracy, is make sense of unexpected events [e.g., FBI reopening investigation into Clinton emails] or judgement call on the subtleties of human, and indeed crowd behaviours.”


What about social media influence?

Through the last couple of elections I’ve been spammed pretty categorically with two distinct sets of research around politics. The first, looks at the high level of social media support – often cleverly assisted by targeted promotional bots – associated with individuals like Donald Trump and Nigel Farage. The second is based on the carefully conceived viewpoints of experts based on a number of sources.

Simon Sear, Practice Leader at SPARCK the strategic design consultancy for BJSS gives a lot more weight to social media data than carefully constructed polls.

“When people use social media, they become less guarded about their true social and political affiliations,” he says. Contrast this with having to admit embarrassing sentiment and intentions to a potentially judgemental human pollster, or being bombarded with various questions on online surveys. Who would reasonably want to subject themselves to being labelled an ‘uneducated’ Trump supporter, or a ‘racist’ Brexiteer?”

He believes “it is time for pollsters to learn from this and to develop next generation tools which use big data and data science to measure social sentiment across the electorate.”


And how about the escalating role of digital marketing?

It is certainly true that the manipulative social media marketing approach does seem to be nudging in a lead in real life and that more right wing candidates seem to be best at manipulating this.

Research from email marketing company, Mailjet, analysed marketing communications by Trump and Clinton for a two-month period ahead of the election, and while both missed opportunities, it showed Trump had a far superior strategy with improved personalisation and clearer calls to actions.

“Our research found that Trump had adopted a stronger email marketing strategy from the start, with clear and concise messaging from the subject line to the call-to-action at the end,” Josie Scotchmer, an email marketing specialist at Mailjet tells us.

“However whilst his campaign provoked a sense of collaboration as he asked for opinions and feedback regularly, his content also evoked a strong sense of fear. His win suggests that populist, more right wing candidates do better when it comes to propaganda because their messages are strong and evocative. For example, Trump constantly reminded his email recipients that Clinton could not be trusted to do the job and 40% of Trump emails referenced the number of days left until the election. By instilling a sense of panic American citizens were stirred up to take action and go out and vote for him.”

This all fits in with the wider, much talked about “confirmation bias”, associated with more and more people accessing their news via social media rather than independent sources. It presents a level playing field for all information presented – no matter how nonsensical - and there is no simple way to analyse the impact of this.



Also read:
Alteryx platform blends data sets to predict US election result by zip code
What would Donald Trump as US president mean for tech?