Software systems need stability. It's a no-brainer statement that we can all agree to whether we happen to be mere consumer-users of enterprise software, or whether we happen to be CIOs overseeing some massive estate of cloud and on-premises based compute and storage resources upon which people's lives might depend.
But there's a problem. Software application and data services stability isn't generally tabled as a defining factor in terms of what should make any particular suite or toolset appealing to its technical users and end consumers.
"Our software is stable, it works, it doesn't go down as much as some other software and when you turn it on (figuratively perhaps, if it isn't always-on in this age of web and cloud), it fires up and works and stays on," said no technology keynote speaker ever.
Stability is oft assumed, perennially overlooked but ultimately does get the alarm bells ringing when it's not there.
Smarter than the average smart bear
New England and Silicon Valley software quality tools company SmartBear has attempted to up its stabilising prowess this late Spring 2021 with the acquisition of application stability management specialist Bugsnag.
Bugsnag (we snag bugs so your software doesn't get bogged down and buggy, get it?) develops sophisticated software error-monitoring technology. It is perhaps most famously used by companies with extreme always-on IT backbone demands and counts Airbnb, Slack and Lyft among its some 6,000 customers.
In terms of actual use, Bugsnag offers software application development engineers, client observability specialists and release management teams a way to make data-driven decisions on when to build features versus fix bugs. All of which was clearly flavoursome enough functionality for SmartBear to purchase the company, but none of which really explains what software application stability really is at ground level.
What is application stability, really?
Co-founder and CEO of Bugsnag James Smith says that - as a measure of worth - application stability essentially represents how many customers are actually able to complete a transaction or interaction in an app. It obviously has a significant impact on customer conversion rates, average purchase values, engagement, loyalty and other important business metrics.
"Broadly speaking, application stability is a measurement of the number of total app sessions that are crash-free or the percentage of daily active users who do not experience an error. Traditionally, software application stability was a KPI that has been used within engineering teams and has quickly been making inroads with product release management, observability and data science teams," said Smith.
Of course it's true, without measuring application stability, organisations cannot accurately evaluate the general health and quality of their applications and cannot make data-driven decisions on when to build software vs. fix bugs.
It all comes down to User eXperience (UX)
As with so many technologies today, application stability is often regarded not as a factor of what an application's functionality can deliver or how powerful it is - but rather, as some measure of the User eXperience (UX) delivered to the human beings who end up coming into contact with the software inquisition.
"Stability is tied to the user experience in applications. A stable application provides a good user experience, where users are not impacted by errors or crashes when using the application. Application stability and user experience can be impacted by various session-ending events, which include errors and a few other types of problems that can occur in mobile apps (see some examples below). How often these problems occur and how many users are impacted determines how stable or unstable the application is," said Bugsnag's Smith.
The team detail four example key predicaments where application stability may have been eroded to illustrate the mechanics at work here:
- Application errors and crashes - A crash happens when a critical operation fails unexpectedly. For instance, if a user is adding an item to their shopping cart in a mobile application and the app closes unexpectedly without completing the action they wanted. These are also referred to as unhandled exceptions.
- Application Not Responding (ANR) errors in Android apps - An ANR occurs in an Android app when the User Interface thread of that app takes too much time for an operation. In that case, an ANR message is shown to the user, locking the screen and preventing that user from being able to take any actions in the app. The only way to remediate is to either wait or kill the app, often causing the user to lose all current progress within the app. Even though ANRs are not crashes, the impact to the user experience is just as bad as a crash.
- Out Of Memory (OOM) errors in iOS apps - OOM or Out of Memory errors occur when the user's device has low memory, but that distinction is usually lost on the user. From the user's perspective, the app suddenly disappears or shuts down without warning, which feels exactly like a crash.
- App Hangs/App Freezes in iOS apps - An app hangs or freezes when it fails to respond to user interactions, causing user frustration. Although both are detrimental to user experience, fatal app hangs are worse than non-fatal app hangs because the app never recovers and unfreezes. An app hang is considered fatal when after 10 seconds of hanging, it is terminated by the iOS system watchdog.
Stability installed, in 1-line of code
Bugsnag's application stability management platform gives software engineering teams the visibility, prioritisation mechanisms and the diagnostics they need to avoid some of these issues. The Bugsnag Software Development Kits (SDKs) capture real user sessions and errors automatically from customer-deployed mobile, browser & backend applications and, crucial, these SDKs are installed with one line of code in under five minutes.
Captured errors are mapped back to original 'lines of code' on every platform, even when debugging information is stripped at build-time. The technology's platform-specific aggregation algorithms ensure errors are grouped accurately by root-cause, enabling immediate prioritisation of bugs based on impact.
Ultimately, a stability score is issued for any enterprise software application exposed to this platform in use here. This leads to software vendors being able to create offer Service Level Agreements (SLAs) that are more accurately mapped to the real world needs of customers in different market segments.
Technology managers traditionally (and perhaps stereotypically) don't enjoy enough space at the boardroom table, but if they can come to AGMs with reports detailing vectors including software stability scores, application release health and system uptime quotients, this could just be the kind of language that appeals to the wider board.
"Our software is stable, it works… so here's our stability report and release health dashboard," said some IT managers, soon, perhaps.