Data Mining

What awaits discovery within 'dark data'?

This is a contributed piece by Bob Laurent, Vice President of Product Marketing at Alteryx Inc.

It’s a well-worn cliché, but we live in a world of data that only grows richer, more layered and far deeper every second of every day.

As a society, and as businesses, we used to have a good handle on our data. We knew what it was, where it was kept, and we used it in a very formal way, treating it as something very precious. You could argue that a lot of it was always ‘dark’, as it was locked in files and accessed only by those in physical proximity as well as permission. Within organisations data processing used to rely on very structured, defined data sets, but the rise of social data, the Internet of Things (IoT), machine learning and connected devices has introduced a seemingly unlimited supply of unstructured data. It comes streaming in from multiple sources – cloud data, device-driven data, social data, financial data, and everything in between.

Rather like fossils that might be found when excavating for oil, there’s some evolution at play in what we use data, the ‘new oil’, for. There are the incremental improvements that come from producing a 0.25 per cent better sales campaign using more accurate data, and there are massive leaps forward that the big data revolution dream was partly founded on. Those are the big leaps whereby new markets are discovered, major new insights gleaned, and perhaps even new fortunes are made. This, in those fossil terms, is the difference between gradualism and punctuated equilibrium: Small changes versus a sudden one.

But what does this mean in the world of data analytics and business intelligence? How does a business or public sector organisation really make a big leap using the raw material of data to uncover an insight that might change the game for their organisation?

Looking at the amount of connected devices coming out of CES this year as an example (with every conceivable device or thing connected and made smart including clothing and jewellery, and new generations of phones, TVs, tablets and so on), it’s clear that the data landscape is only becoming more complex for businesses. The categories of data, and their relationship to consumers, may not be clear at first but relationships and insights there will be emerging at an ever increasing rate.


The passed-over data

Making sense of it all will be imperative for organisations looking to stay competitive and make smarter analytical decisions—yet, according to IDC, 90 per cent of an organisation’s unstructured data is never analysed, referred to as ‘dark data’. Gartner defines dark data as the information assets that organisations collect, process and store in regular activity, but fail to use for other profitable purposes.

And like the secrets that we don’t know we’re looking at in the fossil record, what if this dark data holds the key to saving millions of pounds, connecting with customers in new ways, preventing data breaches, or more?

It’s a problem as complex as the society that birthed it. The challenge was previously tackled with specialist business analysts – but generally in those days the data analysed was not so dark – it was collected with purpose, and insights were not hidden under such volumes.


Nowadays, with the data from all these new devices and online services added to the mix, there could be any amount of insights to be gleaned – but the likelihood is that this will happen more frequently with the line of business ‘doers’ doing the analysis. Those in the know, who live and breathe the world of the consumers, the customers, the data itself, will be able to fast-track insights.

To that end it will be the exploration or self-service every member of a business that will help connect the disparate dots of data into a ‘pointillist’ masterpiece. We need data experts, and data curators to ensure governance and compliance, but for insights that may not mean anything to an outsider, we need regular people interacting with their business data so that they can make a big insight (or a still-valuable small one) for themselves. These line of business, self-serving analysts, will be the ones who will be able to see the light in the dark data, made up of all kinds of content from old customer information, the log files of machinery, account access metadata, raw survey data beyond what the business first made use of, and even email exchanges, and presentations.

Additionally, a ‘data catalogue’ can help turn formerly dark data into explored data, by providing a roadmap. The goal of this tool is a catalogue designed for end-users with curated tags and comments. A quality solution will preserve information regarding the source and lineage of data so the data on the data becomes a rich source of insight and quality control.

Effectively, dark data is the spoil heap into which it’s possible we’ve been throwing pearls we didn’t realise, or couldn’t imagine that were there. Yet now we value data differently, we understand it better, and we can more easily find the insights we want from it. A large part of this has come from upskilling and raising awareness of the data age we live it. Yet we can do more – democratising access to data within organisations, and assisting those who may not know how to interrogate and apply it to their roles, is the final step in a true democratising of data.


« C-suite talk fav tech: Jason Collier, Scale Computing


InfoShot: 50 states of tech »
IDG Connect

IDG Connect tackles the tech stories that matter to you

  • Mail


Do you think your smartphone is making you a workaholic?