A new recipe for enterprise data, 'too many cooks' is over

The adage 'too many cooks' might still apply in the soup kitchen, but in cloud-centric data analytics, there is an argument for more ingredients (data sources), more cooks (data scientists) and more servings all round.


The old adage 'too many cooks' might still apply to soup, sandwiches and sausages, but in the cloud-centric world of enterprise software and the data analytics it now supports, there are compelling arguments for more ingredients (data sources), more cooks (citizen data scientists) and more serving plates (application endpoints and analytics payloads)... but to serve up this new dish, we need a method to ensure we prepare, blend and manage our digital mix in the right way.

At the risk of tabling (pun not intended) too many cooking analogies, this above truism is intended to suggest that, as in food, ingredients are everything. Method will always be fundamental too, but without a good source and knowledge of provenance, we cannot hope to serve up the right kind of dish, be it data or delicious delicacies.

If data analytics were a pasta course, then it would probably be spaghetti alla puttanesca i.e. a dish with all manner of ingredients in various shapes and sizes and from the widest variety of sources possible.

Garbage in, garbage out

Leaving the kitchen for a moment then, the lesson for data analysts, data scientists and (increasingly now) citizen data protagonists is an important one when it comes to predictive modeling tools now being used to inform business decisions.

Predictive modelling is a hugely useful tool but, without the right data fed into it, the predictions themselves become completely useless at worst, or unreliable at best. As they say: garbage in = garbage out.

Senior director of product marketing at data analytics automation company Alteryx is David Sweenor. Explaining that predictive modelling is, ultimately, based on the assumption that what has happened in the past is likely to happen again in the future, he further specifies that we need to know that data available is both sufficient in volume, variation and quantity.

“But while these grounds have traditionally served us comparatively well, in periods of significant uncertainty (such as during the pandemic), these assumptions – and the source data - break down entirely,” said Sweenor.

Post-Covid data scarring

What is being suggested here is a kind of post-Covid scarring effect that is being left on the data landscape i.e. the way we have collected, collated, corralled and consumed data ingredients in the past will not necessarily serve our needs in the immediate future. This is an inconvenient truth further amplified or exacerbated by the now widespread use of Artificial Intelligence (AI) and its need to ingest vast new swathes of information.

Alteryx’s Sweenor says that the issue lies directly at the feet of the hunter gatherers.

“Businesses are exploring the benefits of AI with the promise that something can be completed quickly, efficiently, and cleverly… but many of the methods used today are often still based on linear or logistic regression (which make specific assumptions of the shape of the data) and decision trees or random decision forests – a complicated set of ‘if this, then that’ rules,” explained Sweenor.

Looking at the experiences and learning gained at Alteryx both inside and outside of its customer base, the company notes that with analytic modelling today, some recommend that the best strategy for major disruptions (with the wake of Covid being a perfect example) is to just discard data that doesn’t fit the model as an outlier and utilise other data points or indicators to bridge the dark period.

A bigger data ‘blip’

“But time series modelling (aka forecasting), while hugely useful in specific cases, was not designed to adapt to a ‘data blip’ of this scale with these kinds of disruption outliers. Instead, as organisations still require accurate and timely predictions, what we are seeing is a growing need for analytics operations (AnalyticsOps) strategy,” said Sweenor and team.

If enterprises do now recognise the need for AnalyticsOps, they will likely also dovetail this type of initiative with a DataOps strategy.

Just as AnalyticsOps is designed to look after the throughput, health and results of the analytics operation from a more holistic lifecycle point of view, DataOps involves the data workflows and processes needed to deliver high quality data and results to ‘data consumers’ (machines and people) in a timely and contextual fashion.

For Sweenor and team, the key lies in achieving a sort of updated form of ‘data archaeology’ i.e. digging through existing data repositories and discovering new data sources and new ways to use existing sources – no matter how small – to plug the data holes for forecasting and other predictive modelling techniques.

“The reality is that the world is still running, and people are still buying things – what’s changed is the how, the when, and the where. Thus, changing the source of the information. To fill in the gaps, some organisations are even buying/renting third party data from exchanges and marketplaces now designed to serve this requirement,” said Sweenor.

Looking forward to a moveable feast

In summary then, Sweenor points again to the deep data scar left by the pandemic. They say that in response, there is a huge requirement to pivot and shift away from traditional (perhaps now archaic) data modelling techniques. Instead, we need to move towards simpler, more transparent and more repeatable models that are constantly monitored and updated – models and dynamic systems that adapt to the constantly changing consumer behaviour and the environment.  

In a world where we don’t know what ingredients we will be faced with tomorrow, it makes sense to make data analytics an adaptable moveable feast.