The next wave of disruption: Graph-based machine learning

We look at the pros and cons of machine learning and graph technology and how the two are now working together

Machine learning (ML) is getting a lot of attention at the moment. This is partly because a slew of new companies are emerging which are using it in innovative ways. And partly because it can get easily subsumed into the fuss and furore about AI and the rise of evil robot intelligence. Graph technology, on the other hand, is something which takes more of a back seat and yet, in a lot of ways, also sits at the forefront of the big data and analytics movement. 

“We firmly believe is that it's at the intersection of machine learning and graph technology where the next evolution lies and where new disruptive companies are emerging,” says Ash Damle, Founder and CEO at Lumiata which helps healthcare organisations makes predictions.

“It's only recently that companies can use graph at true scale and, now, by integrating with ML, we're moving much more into a core understanding of artificial intelligence, deep neural networks and image recognition.”


So, in the simplest terms what are these two technologies?

At the most basic level machine learning takes large quantities of data to make predictions about future events. While graph technology is more concerned with the relationship between different data points.

Claus Jepsen, Chief Architect, R&D at Unit4 which provides enterprise applications, summarises:

“Machine Learning is really the umbrella and graph technology is a way of representing data when using machine learning.”

While Peter Duffy, CTO of capacity planning as a service provider, Sumerian adds, this means: “There is huge potential for businesses to take advantage of both.”

David Thompson, Sr. Director of Product Management at LightCyber further clarifies: “Graph technology can be considered a type or technique of machine learning, or, at a minimum, aspects of graph technology have strong application to machine learning.”


What about a few examples?

A company like Darktrace provides a good example of machine learning. This scans employee activity behind the enterprise firewall to profile what things should look like on a normal day-to-day basis and make it epically easy to flag suspicious behaviour.

Simon Crosby, co-founder and CTO at security firm, Bromium suggests that: “automatically assembling a graph of connectivity between data points is a powerful addition to learning.” 

He provides the example of LinkedIn which “does a pretty good job of suggesting contacts that you may know” or “Microsoft Office Graph automatically builds your social graph from your email connections”.

“From a graph - even if bits of it are imprecise and incomplete – one can infer causal relationships of great value,” he adds. “Graph tools are incredibly powerful at sifting through data to identify relationships and their relevance in a particular context.”

Gerry Carr, CMO at AI and Ml start-up, Ravelin explains how he uses both approaches to tackle fraud detection for online businesses. “So let’s say we find a fraudster in a merchant’s data using the predictive [machine learning] algorithms - with graph networks we can quickly analyse connections that this user has based on a shared email, phone or a card.

“This allows our merchants to conclude that these cards, accounts or devices are very likely to have been created with a fraudulent intent. You can use the output of your graph network to feed into your machine learning system as another signal,” he adds.

“This means if someone isn't yet a fraudster but has a lot of devices and credit cards in their network, that might be a signal that is predictive of fraud, but you wouldn't be able to use that in your machine learning system if you didn't have a graph network.”


How about a bit more information on the relationship between the two?

“In essence, Machine Learning (ML) and graph technology are two different methodologies that can be applied to achieve the same overall goals - allowing enterprise organisations a way to analyse and represent large volumes data to provide meaningful insight,” says Damle of Lumiata.

“Each can play a part, depending on what an organisation wants to achieve. ML, for example, is the processes of learning an algorithm from numerous examples which comprises of many different methods of doing so. Some ML methods use ‘graphs’ to represent the learnings while others don't.”

Jim Webber, chief scientist at Neo Technology – which describes itself as “the world’s leading graph database” – adds: “Machine learning is about analysing data to ‘learn’ a model or using an algorithm that can be applied to make predictions on new data sets. Machine learning is not tied to a particular representation of data.

“Machine learning algorithms help data scientists discover meaning in data sets, and these insights can be expressed as relationships between nodes in a graph. Graph databases enable efficient storage and traversal of information about relationships. Therefore, graph data can either be the input or the output of machine learning processing.”

He adds machine learning can also benefit from graph database storage. “This can significantly speed up machine learning algorithm processing, enabling ‘near real time’ analyses that would be too slow using non-graph storage like relational databases,” he explains.

Matthias Broecheler of DataStax – which describes itself as “the only scalable real-time graph database” – also highlights the benefits of graph databases over other ways of representing data:

“Companies are creating data sets that are made up of increasingly connected and complex structured data gathered from multiple sources and they are starting to extract value from it. However, they are finding existing database technologies to be poorly suited to handle this type of data and are looking for better alternatives. Graph databases come into their own here.

“There are a variety of use cases where a graph database is a better fit than other database management systems including relational or general NoSQL database systems.” He lists these to include master data management, recommendation and personalisation, security and fraud detection along with IoT and networking.


What are the pros and cons of each technique?

Webber of Neo Technology says “we see any modern machine learning initiative involving both graph analytics and ‘traditional’ tabular based algorithms in a complementary fashion”.

“Graph-based machine learning effectively opens the door to a whole new category of algorithmic implementations that benefit from the fact that relationships are explicit and treated as first class citizens. So all ‘standard’ machine learning algorithms can be run on graph data, but the differentiating factor is the computational complexity of those algorithms that take advantage of the relationships being explicit,” he adds.

While Carr of Ravelin wraps it all up nicely with a very handy check-list:


The pros and cons of machine learning:


  • It is highly adaptive and self-learning, meaning minimal maintenance is required compared to other approaches
  • It works extremely well in high volume and peak-scale environments, indeed models improve with more data
  • It provides instant and constantly evolving scores based on new events so always current
  • The predictive capabilities mean frauds can be declined pre-checkout to avoid chargebacks



  • The models need data to provide accurate results and can therefore be slow to start producing results
  • Most ML models are hard to inspect which means the reason for a decline can be difficult to decipher (black box syndrome)
  • Machine learning models do not see connections between entities that may be obvious to a human eye
  • Models can be slow to adapt to new a vector as it builds up evidence


The pros and cons of graph networks:


  • It is easy to visualise and so give a rich insight to what is happening in a network
  • It is easy to find connections that are difficult/impossible to find with any other technique 



  • “It is hard to say without the context of what you are using it to do. In fraud it’s not enough to use on its own as it is one that needs the machine learning algorithms to actually stop the fraud attempts and the graph networks to extrapolate the connections based on those fraudulent actions.      
  • “A general drawback is that depending on the technique used the processing resources required can either be too slow or too expensive to use. Also trying to find too many connections invalidates the findings as you will find too many and not be able to see the trees from the forest.”



Also read:

Can ‘good’ machine learning take on global cybercrime?

Database helped reporters follow the Panama Papers money