5 Foundational steps for successful data discovery and classification

The five most common pitfalls around discovering and properly classifying data and how to avoid them.

This is a contributed article by Jan van Vliet, VP and GM EMEA at Digital Guardian.

 Despite its importance, data discovery and classification can present a range of challenges that can prevent an organisation realising the potential that exists in its data. From clear objective setting to data quality and security, just a few foundational steps can help make sure that time and effort invested into data discovery and classification offers a good return.

1. Start with the objectives before working on the data

There are a lot of organisations out there who've gone down the path of capturing data en masse in the hope and expectation that its value will - at some point - become apparent. In reality, this is often a classic example of putting the cart before the horse. In the enthusiastic rush to make data-informed decisions, people often find that the data they've collected actually lacks the information they need because they didn't define clear objectives at the start. Turning that thinking on its head can help ensure time and money spent on data discovery and classification actually delivers real business value. Setting objectives first will help define what data needs to be discovered and classified.

2. Don't get bogged down in data capture and analysis

Similarly, even when goals are set, it's not uncommon for organisations to get seriously bogged down in the process of data collection, organisation and storage. Like the dog that finally caught up with the car it was chasing, they collect the data they wanted but suddenly run into a huge question: ‘Now what do I do with it?'

At some point in the process, data needs to be turned into knowledge. Building this into a discovery and classification strategy can ensure that the process is more likely to move to a point of measurable impact.

3. Focus on data quality

Poor quality data makes it much more difficult to deliver value. For instance, planning search and segmentation fields ahead of time will ensure search criteria deliver the expected value. Project leaders should also develop a governance plan with a clear delineation of who is responsible for entering, validating, and maintaining data. This should include user protocols covering where data gets entered, and when.

The choice of tools is also really important to data quality. When selecting a CRM tool, for example, pick something that is easy to access and use in every situation where users communicate with customers and prospects.

4. Understand the value of data and protect it

For many, the motivation to embark on a data discovery and classification project comes from its potential value, be that straight to the bottom line or as a way to fix existing problems. But, it's ironic how so many organisations seem to put way more effort into ‘digging up' their gold mine of data than they do into protecting it.

If a discovery and classification project sets out to identify which data within the organisation is most at risk, for example, then what? Potentially, that's extremely valuable information, but the value of that insight only arrives when that data is then properly secured - and that's not a given.

As more data is discovered and stored, it follows that there are more targets for malicious individuals or groups. Add to that the ongoing risk of employee-centric data breaches, and it can create a recipe for disaster.

By focusing on processes such as permission analysis user analytics and change auditing (among others), data owners can strike a better balance between uncovering value data and protecting it.

5. Consider how AI can help solve data challenges

There are many organisations out there who already own masses of data that's growing in volume and potential value every day but can't do any useful discovery or classification with it because there's simply too much for people to handle.

Machine learning tools are likely to be the only way organisations can hope to make progress when data volume becomes an issue. AI-powered auto-classification, trained on a small subset of properly recognised data, is a process that improves over time as the machine learns what defines a document specific to an organisation, for example. Organisations that get to grips with the potential of AI and ML now will be better placed to leverage more of their data in the future.


Jan van Vliet is Vice President and General Manager, EMEA at Digital Guardian. A seasoned senior executive with a proven track record of success in both emerging and mature markets, he is responsible for expanding Digital Guardian's business and market share throughout EMEA, driving strategy and overseeing operations in both regions.