Business Process Management (BPM)

Top Tips: How To Stay Ahead Of Network Outages

24-12-2014-how-to-stay-ahead-of-network-outagesLawrence Garvin is Head Geek and Technical Product Marketing Manager at SolarWinds, with 25+ years in the IT profession in a cross-section of industries, organization types, and professional disciplines. Lawrence is a Microsoft Certified IT Professional (MCITP), and a nine-time consecutive recipient of the Microsoft MVP award in recognition for his contributions to the Microsoft TechNet WSUS forum. He has been working with Microsoft Windows Server Update Services (WSUS) and Software Update Services (SUS) since the release of SUS SP1 in 2003, and update management, generally, since the availability of Windows Update in 1997. 

Lawrence shares his top tips on staying ahead of network outages.


Some of the most prominent presences on the internet have suffered network outages in recent months – Facebook, Twitter, Amazon, and Google, to name but a few. Most pundits now share the view that it’s become an inevitable phenomenon in enterprise networks.

Like lightning, these can strike at any time, causing disruption of business services, loss of time and money, and the additional cost of repair and redemption. A recent study by the Ponemon Institute on the cost of data centre outages found that the average cost PER MINUTE of unplanned data centre downtime has risen sharply in recent times, and is now up to US$7,900, up 41% from the 2010 mark of US$5,600 per minute. Worse still, for enterprises where the data centre is the core component of the business, such as e-commerce companies, that figure is almost double the 2010 average, at US$11,000 per minute.

The responsibility of overseeing the network infrastructure that supports their company’s critical business applications falls to the network administrator. Whilst they devote most of their time keeping the network up and running and performing optimally, there are still occasions where they experience unexpected network outages. That’s the reality of network management.

It should also be noted that while much of the media attention tends to focus on outages caused by external forces at play, particularly DDoS (Distributed Denial of Service) attacks from criminal gangs, rogue nations and bored hackers, a recent Gartner survey projected that through 2015, 80% of outages impacting mission-critical services will be caused by internal people and process issues, in other words, basic human error within your organisation.

So, what does it take to stay ahead of these unforeseen breakdowns, and minimise their impact if and when they do occur? There are a few steps that organisations can take to simplify their administration efforts, helping them to be better prepared for a ‘bad day’:

Maintain a current device inventory list: An updated device inventory list with details of your network components such as ports, interfaces in use, hardware details, servers, virtual machines, network storage, and so on, will help you track all of your IT equipment for device replacements, end-of-life information, device configuration changes, and the status of devices in use and not in use.

Configure SNMP and flow technologies: SNMP (Simple Network Message Protocol) fetches performance metrics from your network devices. There are different versions of SNMP available and you can configure an appropriate version based on your data requirements and the significance of the device. Similarly, enabling flow technologies on routers and switches helps furnish data that can be used to analyse traffic and bandwidth usage.

Perform network performance baselining: Performance baselines are a standard set of metrics that define the normal working conditions of the network’s infrastructure. You accomplish this by running network baseline tests and determining the standard threshold values for networking hardware. Baselining helps determine and set alerting thresholds for situations where the network is experiencing performance slowdowns.

Identify and define alerts and an escalation matrix: Depending on the thresholds you set, your network monitoring system will trigger alerts on various network issues and errors. It is important to clearly identify and define the point of contact or person designated to receive the alert. In the case of escalations, you need to decide how the alert will be routed based on its severity. Failure to attend to an alert on time is equivalent to not having any alerts configured at all. Delivering timely alerts to the right person significantly reduces network downtime and serious damage to business operations.

Plan for network expansions and technology advancements that will be necessary to accommodate monitoring. Double check any system changes, as having another set of eyes reviewing any modifications will greatly reduce the amount of errors introduced to your network.

These guidelines won’t prevent your organisation from the threat of an outage, but they will provide you with the ability to minimise the damage by helping you to identify the underlying problem more quickly and effectively, and expedite the process of returning to normal functionality.


Lawrence Garvin is Head Geek at SolarWinds


« Why The Workaholism In Emerging Regions?


Microsoft is Moving on With Acompli »
IDG Connect

IDG Connect tackles the tech stories that matter to you

  • Mail


Do you think your smartphone is making you a workaholic?