Data Mining

Self-Service Data Discovery: Worth the Hype?

One of the hottest trends floating around the BI industry for the last three years has been self-service data discovery. It‘s been showcased as the solution to a vast range of problems, from analysing high data volumes all the way through to eliminating the IT backlog.

Although the DIY approach to data discovery looks inviting, some of its much-publicised advantages may not be quite as positive as they seem. An organisation needs to consider the ins and outs of this approach before fully embracing data discovery. We’ve compiled the top three disadvantages of pervasive data discovery that organisations need to be thinking about.

The expert vs. casual trend spotter debate

Recently the FT published an article discussing “Big Data: are we making a big mistake?”. This questions if we’re minimising the importance of expert knowledge as a result of an over-reliance on simplistic data discovery tools. These days we have increasing access to data and more tools to analyse this with, but these come hand-in-hand with traps. Tools, therefore, need to be used alongside a deeper knowledge base and understanding of data analytics.

It’s critical that organisations examine the skills that are needed to achieve precise and correct analysis before they embrace self-service data discovery tools. In the meantime, organisations are adopting a more fact-based decision-making culture, i.e. letting data decide what we do. The real challenge is that organisations usually don’t need their analysts to have certification validating their skills, and yet analysts are making extremely important decisions based on the data stories, without that essential extra layer of expertise.

The case of the biased data                               

The next wave in BI and analytics is data storytelling. However, as storyboards increasingly replace traditional dashboards and reports, organisations are finding it hard to hire talented storytellers. Stories provide a simpler way for businesses to comprehend and then retain information because they explain scenarios in a coherent manner, providing causal explanations, like ‘xxx happened because of xxx’.

However, data can easily provide causal explanations where they do not exist. Biased stories happen in two ways; first, when correlations are understood as causes, i.e. because two events co-occur it doesn’t mean that one causes another. In consumer insights, false cause and effect inferences lead to confusion around what might generate particular consumer behaviours and subsequently to the design of ineffective incentives and promotions. As a result, in these situations, marketing money is being misused.

The second way is when only limited information is accessible, but it is recognised and presented as the complete information needed for decision-making. This occurs when we become oblivious (knowingly or unknowingly) to the limited information that we have and ignore other possible answers. There is a false assumption that if the data appears to be sufficiently large, all information about the question being asked seems to be answered through what is presented. A data set may be very large and still contain only partial information or part of the data to explain the problem. This leads to biased stories that showcase just one of many possible explanations, overshadowing any other possibilities. The coherence stops us from spotting the critical gaps in the story.

The blended and pure data face-off

Blending in some respects can be beneficial. It often lowers the value of the pure product to give a cheaper outcome. But when everyone keeps blending data and blended data sets keep proliferating within an organisation, this results in a reduction in knowledge. It becomes impossible to identify the origins and accuracy of the mixtures. Statisticians who regularly blend various data sets for analysis have long since understood that the process requires strict documentation of all steps to ensure verification and reproduction as new data arrives. However, this process comes at a lofty price! The users who want to use or further analyse the sets need to get to know the whole history to be able to correctly interpret the data. Therefore, the cost of maintaining and using these data sets can grow exponentially.

Overall, most of the disadvantages of self-service data discovery arise from human error, due to an illusion of simplicity, lack of understanding of the range of problems linked with data analysis, lack of qualifications and processes, and organised cooperation between various specialists.

Simply put, the DIY approach can lead to individual segregation and analytical silos. To address the need for quicker analysis, organisations need to target how their platforms and tools can help eliminate these traps through governed and collaborative self-service data discovery.


Dr. Rado Kotorov is Vice President of Product Marketing for Information Builders


« Top Tips: How to Incorporate Best Practice into Change Management


News Roundup: Reality Shows, Killing Dogs and Crap Wages »
IDG Connect

IDG Connect tackles the tech stories that matter to you

  • Mail

Recommended for You

Trump hits partial pause on Huawei ban, but 5G concerns persist

Phil Muncaster reports on China and beyond

FinancialForce profits from PSA investment

Martin Veitch's inside track on today’s tech trends

Future-proofing the Middle East

Keri Allan looks at the latest trends and technologies


Do you think your smartphone is making you a workaholic?