Graph databases strive to tackle fake news

Could visualisation tools help to see us out of the jungle of fake news?

The problem with the fake news phenomenon is that it’s vast and complex and nobody agrees on a definition. Which makes it pretty much like any technological invention except that - unlike cloud computing - it’s impossible to get independent analysis. Everyone has their own personal take on it, and the producers are never consistent.

The nearest we have for a consistent definition is that it’s something that’s fabricated and promoted ruthlessly. Having said that, British tabloids have a long tradition of running both fake news and professional journalism. At its peak, The Sunday Sport was notorious for stories about World War II bombers being found on the moon and anti-social behaviour by space aliens.

Technology may have exacerbated fake news though, because it removed some of the checks and balances. So, how can we combat something if we can’t even identify it, let alone monitor and measure and manage it? These days technology has put all the power in the hands of the fakers. It’s made publishing cheap and - as a result of the intense competition that bred - made sub-editors and fact checkers into unsustainable overheads.

We look at the pros and cons of machine learning and graph technology and how the two are now working together. Check out: The next wave of disruption: Graph-based machine learning  

So technology, by multiplying volumes and shrinking veracity, has completely faked up the news industry. Now, however, technology could be its salvation. Graph database company maker Neo Technology is one of the companies helping make amends for the journalistic ruin caused by digital addiction.

Neo was one of the enabling companies that helped journalists to track their way through the three terabytes jungle of data known as the Panama Papers. It takes a long time to search 4.8 million emails, three million database entries, 2.2 million PDFs, 1.2 million images, 320,000 text files and 2242 randomly formatted files. Without Neo4J, the graph database tool, the elusive trails of furtive finance could never have been followed.

“At its highest level, investigative journalism is synonymous with data science. In that sense, there will still be an element of doorstepping and asking people questions. But mostly, journalism will be set in the world of digital data,” explains Emil Eifrem, CEO of Neo Technology.


How can graph databases help curb of fake news?

Google and Facebook both have programmes in the making to combine user feedback with behaviour analytics and third party fact-check sites to flag fake news. But do you trust the owners of YouTube, Twitter and co not to be politically partisan, given their known affiliations? Not everyone does.

Graph visualisation tools like Graph Connect can help social networks stop the spread of hoax stories. Neo Technology has even set up a Fellowship to encourage people to start filtering out the fakes.

“We have to make the distinction between journalistic error, which always happens, and blatantly wrong news,” says Eifrem. “There isn’t a moral equivalence between a mistaken claim in an otherwise well researched piece in the Washington Post, and a complete fabrication about the Pope endorsing Donald Trump, in an invented paper called the Denver Guardian.”

Graph technology works on several levels that Facebook doesn’t. For example, social networks do not verify the identity of their users through, say, credit card details or ID checks. They don’t block repeat offenders either, so much of the data online is unverified.

The problem is that fake news is created and shared by real people. If it was created by Netbots, it would be easy to spot because they have a rigid pattern of behaviour that can be spotted by analysis.

Algorithmic detections are not enough, since their inflexibility creates false-positives, meaning fake news slips through while genuine news is falsely flagged. User flags are both time-consuming and open to abuse.

As with reviewing fraud, the key to detecting fake news is connections – between accounts, posts, flags and websites. By visualising those connections as a graph, we can understand patterns that indicate fake content.

Sites like Facebook have all kinds of datapoints available, alluding to the account (the history, age, behaviour and connections), the devices, the content and the author. By collating these variables graphs can create nodes for three items: accounts, posts and articles. Meanwhile IP addresses can also be monitored to see where posts are coming from – and be added to the model.

Graph visualisation highlights the complex connections between social media data. But the identification of symptoms of fakery is very judgemental. One man’s anomaly is another man’s normal.

Pioneers in this field are Leila Haddou and David Blood, two investigative journalists for the Financial Times, who used graph databases to find patterns of mass manipulation during the recent Dutch general election, in March 2017. This is a relatively new discipline for journalists, and finding a way into a story, when there are vast quantities of complex data, is a process people learn on the job, according to Haddou. The two main components they learned to look for were evidence of the production of fake news and the automation of delivery.

The delivery part is relatively easy to spot. Twitter bots, being a lot more prolific and consistent than humans, could be identified by their outstanding activity. Haddou and Blood looked specifically for automated cheerleaders of Geert Wilders, the far right wing candidate in the Dutch election.

“We found the numbers of followers spiked whenever there was an event,” says Blood. The day when Wilders was convicted of hate speech and the day of the Xmas market attack in Berlin saw huge increases in activity. The nature of the content was another variable used to gauge botnet behaviour. Wilders’ supporters were deemed to offer less mainstream content and many of the news sharers were of Russian origin, with stories from Russia Today and Sputnik heavily featured.

Is this decisive evidence though? Blood admits that much information is still open to interpretation. The fact that Wilders had large numbers of US based supporters could be significant, however. Americans aren’t noted for their interest in European politics, are they? “Quite,” says Blood.

Still there are checks and balances to be made as we begin to hack our way through the reams of digital ‘churnalism’. Fakers, be they right wing columnists or left wing conspiracy obsessives are poisoning the well of knowledge for all of us. 

Neo Technology’s Accelerator programme, which encourages people to get into data journalism, is non-partisan and open to anyone. We need to take back control of this invasive culture otherwise, as The Sunday Sport once unwittingly predicted, these aliens to the truth will turn us all into olives.