Yesterday, we were suddenly confronted with a major Xero outage, highlighting the critical importance of POS software reliability and business continuity. As a provider of retail technology solutions, we've seen our share of software hiccups, but this incident underscores the software reliability problem for a mission-critical business system.
What happened was that Xero, for many people, stopped.
One of our users described his situation: "It was like showing up to open my filing cabinet and finding the doors welded shut. I had no financial data." This Xero outage couldn't have been planned at a worse time, as it coincided with month-end accounting and the standard payroll processing day for many businesses.
Cloud Accounting Issues and the Need for Multi-Region Cloud Redundancy
The root cause of this problem appears to be a lack of multi-region cloud redundancy. Reports suggest that a single point of failure in Amazon Web Services' North Virginia region caused the Xero outage. This revelation raises serious questions about cloud service reliability and disaster recovery planning for this software.
POS System Uptime: A Critical Factor for Brick-and-Mortar Businesses
To illustrate the problem of POS system uptime, let's consider a scenario where a system is 99.9% reliable, and the shop uses the internet to use POS cloud, EFTPOS and accounting, so we have
= (1-99.9% reliable) x 6 (days a week) / (7 days a week) x 365 (days a year) x 8 hours x 4 services = 10 hours down annually
If internet connectivity is compromised, all cloud-based services are affected, potentially increasing downtime to around 17 hours per year.
I confess it's more complex. For example, if your shop uses a cheap internet service or you work more than eight hours in the shop, you can increase downtime.
It's not uncommon for me to hear my users telling me that something went down and they had staff sitting around doing nothing because the internet was down somewhere.
The Root of Xero Problem: Cloud Service Redundancy
Here's where things get genuinely mind-boggling. If we hear correctly, a single point of failure in Amazon Web Services' North Virginia region caused the Xero Outage. As someone who's worked in software for years, I'm stunned. The idea that a company as massive as Xero doesn't have robust multi-region redundancy is like discovering your bank keeps all its records in one computer.
This revelation, plus their handling, raises serious questions about their cloud accounting software:
-Why were the users not notified that a problem was happening, e.g., via email? Most found out only after doing a Google search.
-How could they have their service rely on a single region?
-What does this say about Xero's disaster recovery planning?
-Their system is pretty pricy; the price should be reflected in what the user gets. Our POS cloud solution involves several servers in different locations; if one goes down, the others keep working so our users can keep most of their system processing. That is part of the reason for using the cloud.
Our POS cloud solution employs several servers in different locations. If one goes down, others continue functioning, ensuring our users can maintain most of their system processing. This approach is fundamental to our disaster recovery strategy and exemplifies why cloud computing, when done right, can be beneficial for retail.
Xero status updates
While this article was written, we were notified that Xero resolved the issue. Hopefully, they will provide us with a thorough explanation and a plan to stop it from happening again. If you want current information, check out the Xero status page here for the latest updates on the Xero outage here.
Conclusion: Ensuring Small Business Continuity
Consider this a wake-up call as a stark reminder that we need to be prepared for the unexpected in our increasingly digital world.