Advanced data science, machine learning and the power of knowledge graphs: What can we expect from this combination?

Knowledge graphs are a powerful way to assist data scientists to crack hard data problems, but they aren’t as widely known as they could be. Graph database expert Maya Natarajan explains why that’s changing.


This is contributed article by Maya Natarajan, Sr. Director Product Marketing, Neo4j.

From bridging data silos and building data fabrics to accelerating machine learning (ML) and artificial intelligence (AI) adoption, knowledge graphs are foundational and allow businesses to go beyond digital transformation. Defined by The Turing Institute, the UK's national institute for data science and AI, as the best way to “encode knowledge to use at scale in open, evolving, decentralised systems,” knowledge graphs are a perfect foundation for advanced data science initiatives. So why aren’t they better known and exploited?

This is a problem. Business leaders know the value of their data and are keenly aware that it holds the answers to their most pressing business questions. The insights to improve decision-making and enhance business performance they need, however, aren’t easy to elicit. Hence the widespread interest in machine learning.

Knowledge graphs can help an organisation trying to get machine learning to a useful production status and out of the lab. That’s because knowledge graphs are a special, non-disruptive insight layer on top of this complex data resource. They drive intelligence into data to significantly enhance its value, but without changing any of the existing data infrastructure. Let’s look at how.

Drive action by providing assurance or insight

Knowledge graphs make existing technologies better by providing better data management, better predictions, and better innovation, in part because they fuel AI and machine learning. In practice, knowledge graph use cases divide into two groupings: actioning and decisioning. The actioning graph’s aim is to drive action by providing assurance or insight. Data actioning graphs automate processes for better outcomes by providing data assurance, discovery, and insight, and include examples like data lineage, data provenance, data governance, compliance, and risk management.

A great example of a data actioning graph is a knowledge graph that tracks objects in space, both functional equipment and broken equipment. The ASTRIAGraph project monitors the Earth’s orbit for space objects, including functioning hardware and other ‘space junk’, striving for safety, security, and sustainability. Using a knowledge graph, the team can categorise a lot of disparate space domain data to locate and track objects from the size of a mobile phone to the largest satellite. The ASTRIAGraph predicts their trajectory, minimises risk, and provides complete visibility. With the goal of maximising decision intelligence, ASTRIAGraph curates information and creates models of the space domain and environment.

Paving the road to AI

The real magic of knowledge graphs comes into play as you use them to support AI and machine learning, uncovering patterns and anomalies. A decisioning knowledge graph surfaces data trends to augment analytics, machine learning, and data science initiatives. With all of this, it’s not surprising that Gartner recently stated, “Up to 50% of Gartner inquiries on the topic of AI involve discussion of the use of graph technology.”

We know this from speaking to customers. Moving from an actioning graph to sophisticated decisioning graphs fuelling AI and machine learning is a typical graph technology journey for many data science teams we work with, with knowledge graphs at the centre.

Aiding the ML journey at every step

From data sourcing to training machine learning models to analysing predictions and applying results, knowledge graphs enhance every step of the machine learning process.

In the initial step of data sourcing, knowledge graphs can be used for data lineage to track data that feeds machine learning. In the next phase of training a machine learning model, knowledge graphs allow for graph feature engineering using simple graph queries or more complex graph algorithms, like centrality, community detection, and the like. The results of such algorithms can be written back to the knowledge graph, further enriching it.

The next step forward in sophistication is the use of graph embeddings. Graph embeddings offer a way of encoding the nodes and the relationships in a knowledge graph into a structure that's suitable for machine learning. Effectively, embeddings turn your knowledge graph into numbers and ‘learn’ all its features. Relationships are highly predictive of behaviour, so using connected, contextualised features maximises the predictive power of machine learning models.

Once a machine learning model has been developed, knowledge graphs can be used for investigations and counterfactual analyses by data scientists to understand if a model is useful and making accurate predictions.

Knowledge graphs in action

Let’s look at decisioning in action. UBS, for example, built a detailed data lineage and governance tool that offers deep transparency into the data flows that feed its risk reporting mechanisms to meet finance compliance regulations.

Another example is NASA, which has decades of mission experience that wasn’t well catalogued. NASA built a knowledge graph-enhanced application to comb through millions of documents, reports, project data, lessons learned, scientific research, medical analysis, geospatial data, and IT logs. As a result, an old breakthrough from the Apollo era in the 1960s solved a problematic issue in its 21st Century Orion class of crewed spaceships. It saved a million dollars of taxpayer money by heading off the need for two years of work reinventing the wheel.

And in the life sciences, one large global pharmaceutical company is working with knowledge graphs to help clinicians know when to best intervene for complex diseases. Its data science team used graph algorithms to find patients that had specific journey types and patterns, and find others with similar experiences. This insight is used to train its machine learning model, analyse predictions, and bring back results to help clinicians make better decisions. And we’re talking about scale—this company’s knowledge graph holds three years of visits, tests, and patient diagnoses across tens of billions of records.

By using the power of knowledge graphs, AI and machine learning models are better able to represent relationships. That means the organisations using them can find more accurate interpretations of complex data, putting context back into data, and training AI to be a trustworthy partner.

It’s a powerful trend that we see more and more in data science. No wonder the c-suite is waking up to this innovation.

Maya Natarajan is Sr. Director Product Marketing at native graph database leader Neo4j