New research – The State of Data Discovery and Cataloging (based on a sample of 400 from IDC and Alteryx) – suggests that, like in many walks of business, data professionals are wasting large amounts of time trying to find information. This may be particularly significant however, because ‘data analysts’ are constantly getting described as the hottest – and best paid – job roles, so all this wasted effort could be hitting companies particularly hard. Chiara Pensato, director of Alteryx in EMEA, provides a bit more information on the findings in the lightly edited Q&A below.
Why are data science and analytics professionals wasting so much time finding, protecting, or preparing data?
Most analysts need to leverage from five to 15 data sources to help them make business decisions. The information that data professionals work with isn’t always easy to use. It is often incomplete, and often not timely, and is hard for them to discover in the first place. On top of this, data professionals are often questioning this information, as it is often not rated by those using it. Without self-service analytics, working with data is often a multi-stage, labor-intensive process to get it into a useable state to interrogate in the first place – and that’s where analysts want to be: Doing the job they are paid for. With data often disparate, spread across a number of different systems, and there simply being so much of it, it makes it difficult to know what is relevant. Data can be in many different forms and structures. Preparing it takes time, doubly so without the right solutions to make the process simple.
Is the biggest problem inherent challenges in handling data or inadequate organizational structures?
Both. Inherent challenges in handling data are hard to avoid. As data volume and variety continues to grow, resources and processes can become more ineffective because data and information assets will be harder to find in the massive amounts coming at the business.
In addition, inadequate organizational structures lead to a lack of knowledge around how and where data is used across the enterprise, and what information assets might already exist that can help professionals stop ‘re-creating the wheel’. On average, they are wasting 20% of their week building information that already exists. A better knowledge-sharing culture, and more effective organizational structures will help stop some of these challenges. A data platform can remove a lot of the issues that arise as the data landscape becomes more complicated.
Another causal factor, often overlooked, is the fact that data professionals are often working more than 50 hours per week. Long hours are symptomatic of data professional’s ability to be efficient and effective in data activities in the position/culture/business they find themselves in. Sometimes they are battling the environment as well as the challenges they are hired to overcome.
Are these problems worse in North America or Europe (or about the same)?
Per year for every 100 employees, the EU and the US fare roughly the same – costing the EU just over €1M, the US $1.7M. Per organization, however, the EU fares better, with only €49M wasted, compared to $103M in for US organizations. Cultural aspects must have an impact, though are harder to quantify.
Are there wider implications for GDPR?
Absolutely. GDPR’s provisions require organizations, wherever they are in the world, to be able to (amongst other things) respond to requests for information or erasure within one month. To be able to find and act on such requests, and to prove consent for audit purposes, data must be searchable, and ideally, well tagged and findable. Any uncertainty about location, or time wasted finding it, adds to the risk of not meeting the standards set in the GDPR. Factor in the scale if multiple requests come in and the possibility of not meeting mandatory time limits for breach notification, right to be forgotten, and the right to be informed might grow exponentially. And even if a satisfactory job is done, it needs to be proved and may be tested by audit at a later date. Data catalogues are only a small piece of the solution. Full compliance involves many more process and responsibilities.
What does data discovery mean in practice?
Data discovery is the ability to locate, understand, access and trust data, and it’s a key enabler of business in the era of digital transformation. Discovery provides the data intelligence to power greater productivity and innovation. Discovery is the beginning and the end of the analytic process: It not only allows you to find that initial data set or information asset, it also provides a central repository to share all the analytic assets in an organization. And when an answer has been found, post-analysis, a fresh round of discovery is often unleashed.
The trouble is, data is often incomplete, isn’t timely, isn’t easy to use, isn’t easy to discover, and isn’t trusted. And when 50% of a data professional’s time is unproductive they are falling short from a relatively well known and fixable issue. When time equals money, organizations need to know how to cut the time to get to insights faster.
Why is this such a key topic in Business Intelligence at the moment?
Data discovery and data cataloguing has always been on the BI agenda, but its importance has risen as data volumes, varieties, and velocities of data have risen. As these go up, the fourth ‘V’ of big data, veracity, can fall. And that’s a problem – and why there’s an industry focus on data discovery. If you can find, and trust, the data, it becomes usable that much faster. When organizations are tasked with out-innovating the market, speed can be vital, because speed as well as intelligent insights make for an ultra-competitive intelligent organization.
Will new roles need to emerge in the field of data science and data analytics?
What Gartner calls the citizen data scientist, or an “advanced” business analyst, are the hot emerging roles, to a great extent. These people who work in-department and understand their data, their business context, and can trust their insights, are the only way to get around the shortage of trained data scientists and associated statistical and economic roles within industry.
What is the single most important thing organizations should understand about this?
Self-service analytics platforms exist that can help more people in the enterprise foster big data analytics innovations without having to rely on the hard-to-find skillsets of highly qualified data scientists or programmers. As more businesses endeavor to compete with vast volumes of data, it will become critical that more people within the enterprise can understand the tools available and get insight quickly from them.
This will be achieved by spreading a data culture throughout the organization, making it much easier for people who make the business decisions to be armed quickly to dig deep into data. It will take a cultural shift across the company and can be achieved by training the right people, giving them the tools they need to become analysts of their own data – and leading by example. The zeitgeist of this trend is revolutionizing business through data science and data analytics – even if those terms are not used by the line of business people becoming analysts themselves!