DevOps without automated data resilience is a flaky recipe

DevOps might be a peacefully paired portmanteau that links developers to their operations counterparts more harmoniously for accelerated workflow, but it also has an accelerating factor that applies across a wider blast radius to speed the potential time to data loss, security breaches and compliance violations.


Automation is everywhere. The current approach to apply Artificial Intelligence (AI) engines with their Machine Learning (ML) fuel systems to automation and autonomously manage almost every aspect of the modern IT stack is a proliferating - although mostly friendly - beast.

We say mostly friendly for a reason; there is no such thing as a free lunch and the drive to put automation into enterprise technology deployments isn’t a cure-all. Quite apart from areas like AI bias (and let’s not go there just now), we need to examine how automation intelligence is being applied to data resiliency, stability and backup.

Information fear factor

Why the data-centric fear factor you may ask?

Because data robustness, ruggedisation and resiliency in the hyper-connected stack is also a factor of another key post-millennial IT trend - and it’s DevOps.

The strategic workplace methodology or ‘approach’ that is DevOps might be a peacefully paired portmanteau that links developers (Dev) to their operations (Ops) counterparts more harmoniously for common strategic delivery goals, but it has an accelerating factor that applies across a wider blast radius than the standard build, test, deploy, high-five let’s all go to the bar script we are normally fed.

DevOps has accelerated every stage of the software development lifecycle, including the potential time to data loss, security breaches and compliance violations. As organisations try to deliver more value in less time, they increase their risk of compromising their customers’ data loads.

Data in manual analogue backwaters

With all the automation and autonomous control we can now apply across the DevOps toolset, to leave data protection in some manual analogue backwater is at best a disservice, or at worst a mission-critical breach waiting to happen.

This is the contention made by Stephen Manley in his capacity as CTO of Druva – a company known for its cloud data protection service.

“The advent of DevOps and cloud computing are shining a bright clarifying light on archaic approaches to protecting data. After decades of trying to keep pace with data growth, backup teams are unprepared for the data sprawl, speed and security risks of DevOps environments. It is time to consider a new approach - automated data resilience,” tabled Manley, in no uncertain terms.

He paints what he says is a picture of the scenario that has been playing out for decades. It is one where IT departments have struggled to effectively back-up their datacentre estate’s quotient of information.

What actually happens, says Manley, is that the IT function tries to keep pace with relentless data growth by making investments in multiple solutions; these are technology investments in deduplication appliances, backup networks and tools such as virus protection software and so on.

Scrambling on the data growth treadmill

Meanwhile, backup teams scramble to troubleshoot failures, manage infrastructure and configure new backups. IT teams are running full speed just to stay on the data growth treadmill.

“DevOps and cloud are driving data protection to a point where it now needs to scale with the data sprawling outside the company datacentre. Not only is data moving to the cloud at an accelerated pace, but the plethora of cloud options increases the variety of data sources - block storage, object storage, hosted databases and more,” explained Manley.

In some ways a victim of its own success, the Druva tech lead says that due to the fact that DevOps workflows optimise the speed of development, we get to a point where traditional backup literally just hits a bottleneck.

While developers can provision new environments, deploy and test their code in minutes, backup all too often requires filing tickets, installing agents and provisioning back-end capacity. While developers think in terms of minutes, backup teams operate on a timescale of days.

With great speed, comes great risk

“With great speed and data value comes great security risks. In the datacentre, the security tools in place will have created a moat around the live production application environment in use by the organisation. Today, however, cyber attackers infiltrate development accounts, so they can spread to production. Even worse, developers can unknowingly import compromised software components, which threaten customers’ data,” said Manley.

To meet the new requirements, it seems clear that organisations must shift from data protection to data resiliency - or (and let’s remember where we started with autonomous AI-fuelled power) automated data resilience.

We can explain and validate this at a technical level because we know that data protection is a passive process that creates an inert copy of data, which a team hopes to never use. On the other hand, data resiliency actively prepares to recover from an incident with minimal disruption and proactively identifies security threats before they can spread.

“After decades of customers insisting that, ‘It’s not about backup… it’s about recovery’, perhaps it is time for the industry to listen,” said Manley.

So how does he think it should work in practice? There are three core steps.

  • First, DevOps teams need self-service recovery because they cannot wait for a backup administrator.
  • Second, they need to be able to bring back their data near-instantly.
  • Thirdly and finally, they need to be able to recover their entire environment because infrastructure, applications and data are now inextricably linked.

To expand the validation factor one louder, we can remind ourselves that since security attacks compromise data, detecting anomalous data change can help pinpoint a ransomware attack, quickly.

As a whole, the message from Druva is to take precautionary steps i.e. flagging anomalous administrator patterns can limit the damage caused by an insider threat. Further, centrally monitoring data recovery or cloning patterns can minimise data leakage.

Protection is from Mars, resiliency is from Venus

“A data resiliency solution keeps organisations safe from traditional outages and user error as well as insider threats and external attacks. Any successful shift from passive protection to active resiliency begins with automation. Since every other part of the DevOps pipeline is automated, manual protection is too slow, unreliable and insecure. Protection is manual, but, thankfully, resiliency is automatic,” enthused Manley.

DevOps and cloud environments are constantly evolving, so data resiliency must be dynamic to capture changes as they happen. Today backup teams periodically remediate unprotected servers and VMs. In the cloud, teams continually create new accounts, clusters and applications, so it is impossible for a team to retroactively protect the environment. Instead, each new deployment needs built-in resilience.

Automation improves security by reducing the likelihood of human error. Insider threats take control of manual operations, so automation reduces the surface area for attacks. An automated service also manages operations, so that administrators are not responsible for security patches, operating system (OS) upgrades, or following the latest best practices.

While an automated data resiliency service can solve the DevOps and cloud data challenges, it does require a new cloud-native architecture that integrates into the modern cloud stack and orchestrates critical workflows. Only a cloud-native architecture delivers the scalability and flexibility needed to support a DevOps environment.

All-together now, orchestrated workflows

As a final steer on this topic from Druva, Manley and his team insist that resiliency solutions should offer orchestrated workflows. IT teams clearly need to address Disaster Recovery (DR), ransomware recovery, governance and compliance; but each of these problems is so complex and distributed that they need a predefined playbook that they can extend and modify to meet their needs.

“Data protection is not enough. The shift to DevOps and cloud has led to data sprawl, increased security threats and a demand for more responsive IT operations. You will not be able to protect your organisation, your data, or your customers with a legacy protection architecture. With a data resiliency architecture, you can shift from passive data protection to active application recovery readiness, self-service responsiveness and proactive security detection. Most importantly, data resiliency is automatic, so that you retain control over your environment even as it scales and evolves,” he concluded.

Taking stock of it all then, we can say that a cloud-native data resiliency service integrates into a cloud stack and delivers orchestrated data protection, cyber resiliency, governance and compliance for a business. If we accept the premise and core technology proposition on offer here and agree that legacy protection cannot keep pace with modern DevOps and cloud environments, then perhaps it is indeed time for automated data resiliency.