Business Management

Top Tips: Using semantics to make smarter decisions

12-08-2015-using-semantics-to-make-smarter-decisionsAdrian Carr is Vice President EMEA at Enterprise NoSQL database vendor MarkLogic. Adrian joined MarkLogic in 2012. Prior to this, he was VP Enterprise for Juniper Networks in Europe. Adrian also worked as VP EMEA and Australasia at Chicago-based analytics software company SPSS where he grew the business to represent nearly half of the global revenues prior to their acquisition by IBM. This followed two years at Mercator software prior to their acquisition by IBM. Adrian's career began in IT services companies with 11 years at EDS before time at ATOS Origin and Cambridge Technology Partners.

Adrian looks at semantic technologies and shares his top tips.

Enterprises today have access to vast amounts of information from many sources and in many formats. Those organizations that can make sense of all this information to discover, understand, and make better decisions will win the information wars.

Fortunately, semantic technologies are proving to be the white knight that provides context and intelligently liberates this information for the benefit of us all.

Semantic technologies are based on RDF, (Resource Description Framework) the W3C standard for representing facts and relationships, and SPARQL (the W3C’s standardised query language for RDF). Natural language processing technologies also play a role, letting you pull entities and events out of free-flowing text. Using these two facets of semantics – facts and documents together – we can represent any kind of information in context.

This ability to assemble a logical piece of content made up of inter-related facts is extremely powerful and dynamic - and ultimately very valuable.

Here are some tips for organisations that want to get started with semantics.

Start with the right underlying technology - First you'll need a robust, scalable triple store – something capable of storing and querying billions of facts and relationships as RDF triples. Because it's not practical to do absolutely everything with triples, your triple store needs other capabilities too: you'll want to be able to store and query text-based documents and flexible structured objects. And you'll want triples and documents to work seamlessly together, so that you can use triples to link documents together and describe facts about the documents. You'll also want to embed triples in documents so that the triples keep their context.

As a result, you will need a hybrid document and triple store NoSQL database that supports linking documents (articles, images, contracts, trades) with entities (described in RDF triples). A document NoSQL Database that can store these aggregate document structures -  but importantly still use RDF as the 'glue' to link information together - is the best approach for easy coding, data management, security, and performance.

Harness the power of billions of facts - You don’t have to create all these facts and relationships on your own! As well as creating your own triples, you can easily ingest freely available triple data sets. Known as Linked Open Data (LOD), these triples are maintained and shared by authoritative bodies such as Wikipedia, Companies House, the BBC, and Open Government Initiatives around the world. You can start out with a pool of billions of facts to add richness and context to your own data.

For example, if you're an investment analyst researching a company you can get the address and industry code (what they do) from Companies House; the number of employees from DBpedia; and the ownership status from the SEC. Using triples that are publicly available you can significantly reduce the time spent on discovering relationships between, for example, holding companies, their subsidiaries, and any recent acquisitions and divestments.

If you are creating your own triples, you'll want to create them in an orderly way using a standard vocabulary. Again, help is at hand. You can start by using publicly available ontologies such as foaf (friend of a friend) to describe people, and FIBO (the Financial Industry Business Ontology) to describe things like business entities. Just like the data, these ontologies are easy to extend and combine with your own.

Use standard interfaces - It’s important, especially as you start out, to use standard interfaces to your triple store. The W3C defines a standard query language (SPARQL) and protocol (REST) for querying triples, as well as standards for updating triples (SPARQL Update), and for bulk updates of semantic graphs (the Graph HTTP Protocol). If you use these standards from the outset, you'll find you can work with most of the tools out there, and of course you will find lots of materials (books, tutorials etc.) to help you.

Once you're familiar with the standard semantic technologies, you can look into the proprietary extensions your store offers, because they may well give you better performance and more power. In particular, look at combining triples with documents and data.

Start small - but make sure you can scale - Don’t be tempted to do too much too fast. It’s much better to start with ten triples and a simple SPARQL query and test that out first. Then build out your data set with, say, downloads from DBpedia. It’s a good idea to experiment using a mock project and explore what’s out there on the web and inside your enterprise to get a handle on how powerful semantics can be before diving in too deeply.

It's also important to use software that can scale from the outset. If you choose the right database and tools, you can use the same software for the first experiments on your laptop and also for building real-world applications with semantics capabilities that can be used throughout your organisation.

Combine semantics with other technologies for even better results - Once you have got to grips with semantic technologies and appreciated their many benefits, it can be tempting to try to do absolutely everything with triples. If you've chosen a database that can handle triples and documents and data, you don’t need to do that. To get the most out of semantic technologies, you should think of them as one more tool in your toolkit. After all, you don't choose a screwdriver for every DIY job.

Semantics works particularly well with search. Imagine writing a search application that has access to billions of facts. You can expand search terms to give a more intelligent search result – for example, when searching for standards that apply to "cardiac catheter" you will also discover standards that apply to "devices that stimulate nerves" and "implantable devices. Then you can supplement the results with facts about the terms you're looking for, or about the results themselves - just as Google does with its Knowledge Graph.

You can also refine the search with a geospatial query, and even with bitemporal criteria. So you can ask questions like "Which companies (or their subsidiaries) that were exposed to risk by the Lehman collapse were mentioned in emails from Board members that also mentioned a word for 'sell'”? And “Which of those e-mails were sent at least a week before the collapse”? And “Which of those did we know about a week before the collapse"?

The world of semantics is transforming both public and private sector organisations. What we are seeing at the moment is simply the tip of the iceberg. Semantic technologies, underpinned by a Hybrid NoSQL database combining semantics, documents, and data, give organisations more context and value from their data, and make it easier to make sense of information from many sources in many formats. That leads to faster, better decisions – and ultimately increased revenues.


« Insane entrepreneurs: The new tortured rock stars


Google, Alphabet and the Try-Fail-Try-Again Economy »
IDG Connect

IDG Connect tackles the tech stories that matter to you

  • Mail


Do you think your smartphone is making you a workaholic?