Enhancing data governance and quality with a Data Fabric strategy

To improve outcomes, enterprises must get their data governance and quality right by moving away from the traditional approach.

IDGConnect_data_fabric_governance_shutterstock_2039213480_1200x800
Shutterstock

This is a contributed article by Dr. David Amzallag, Chief Product Officer and Chief Digital Transformation Officer, BlackSwan Technologies.

It’s well known that extracting value from external and internal data requires an emphasis on both data governance and quality. Gartner defines data governance as “the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption and control of data and analytics.” While data quality is largely defined by how accurate and up-to-date information is; without accurate data, and the knowledge of who is using the information within the organisation, it’s incredibly difficult to monetise that knowledge.  

And yet, despite it being well-known that data governance and quality are critical for enterprises to get right - and considering the tremendous advancements made in technology and data capabilities - organisations are still struggling to get their data quality and governance up-to-par.

A recent survey conducted by professional services firm EY found that 41% of organisations said the most challenging aspect of their data was data quality. Gartner suggests that poor data quality costs organisations an average of $12.9 million a year. 

In addition, the EY report found that 14% of organisations had challenges with access to suitable technology and data infrastructure. Without the right accessibility, technology and data infrastructure in place, it is incredibly difficult for enterprises to put in place a functioning data governance framework. 

Challenges with centralised data

Many of the barriers that are holding back businesses from achieving their data quality and governance goals are as a result of the reliance on a traditional, centralised approach to data. As an organisation grows, an influx of operational sources create a number of data silos. Businesses try to overcome this by pooling the data from these data sources into one place. While in years gone by, there could be no argument for this logic, overtime with increasing volumes and complexity of data, it has led to a number of significant challenges.

For example, it takes extensive time and effort to integrate new data sources in a centralised environment. The cost of centralising data is significant, taking into account investment for storage, compute, interfaces and the task of unifying the data formats of all data sources. Meanwhile, data silos are exacerbated as there is naturally a separation between those who create and consume the data - and the data engineers with big data tooling expertise. This is because the engineers lack the business and domain knowledge, while the data product owners lack the technical acumen. As a result, organisations lack visibility of data consumption across the organisation.

The technicalities of centralising data can also lend itself to the negativity of organisational politics; internal competition can lead to departments not wanting to share their data assets with another department. The lack of visibility and accessibility in a centralised data environment can encourage data assets to be siloed, and therefore lead to the organisation missing out on a number of data monetisation initiatives. 

Data integration challenges in a centralised environment also lead to outdated data being used. For example, when an organisation has grown over time, a third-party may have interacted with a number of different business units within the organisation, each of which has a different operational system. This leads to the data becoming unsynchronised; with some of the data up-to-date and other information no longer accurate. This hinders enforcement and knowledge discovery – and therefore impacts business outcomes. 

Finally, enterprises lack enforcement over data usage. When data is centralised, it is complex to put in place access controls at a granular level and therefore it is a challenge to achieve governance and adhere to regulations. 

A new, decentralised approach to data 

It’s clear then, that the traditional, centralised approach to data is leaving organisations with many challenges to overcome. An alternative strategy is to take a decentralised approach. The design concept of a Data Fabric - one of Gartner’s top strategic trends for 2022 - can help with this; it’s based on multiple data management technologies working in tandem, streamlining data ingestion and integration across a company’s ecosystem. 

One of those technologies is data virtualisation, which enables data assets to be accessed from any operational source, with no need to replicate. In other words, rather than copying data from an operational source to a centralised data repository, datasets can be viewed and analysed (even with complex AI techniques) from where they reside. A true Data Fabric approach would also enable the creation of in-time virtualised data lakes, as needed; meaning data lakes can be created and disposed of at any given moment without impacting existing applications and infrastructure.

This provides a simpler, more cost-effective alternative to consolidating data sources and providers, and enables a single point of visibility of data flows across the enterprise. By gaining this level of visibility, organisations can act on the data in different ways. Firstly, by utilising advanced attribute-based and role-based controls they can restrict visibility and access at a granular level, enabling better enforcement of control decisions. 

Secondly, as data assets are more accessible, organisations can harmonise data sharing between teams and reduce siloed data assets. This ability to dynamically improve data usage is part of the Data Fabric’s real value, according to Gartner. The research firm says that analytics that form part of the Data Fabric, help to cut data management efforts by up to 70% and accelerates time to value. 

Crucially, a superior Data Fabric approach does not mean doing away with existing centralised data lakes or warehouses, but integrating the data within them as part of a dynamic, resilient infrastructure. A Data Fabric can be utilised through an application or platform, and enables data enrichment, processing and visualisation at any point, shifting enterprises away from their data being locked in legacy silos or being replicated across multiple applications.

Organisations that are seeking to improve their business outcomes by modernising their data quality, governance and discoverability need to consider their overall approach to data and ask themselves if the traditional, centralised approach is still able to help them achieve their goals. A strategy incorporating a Data Fabric most definitely can.