My web server terminated unexpectedly at 1:51:15 a.m. on Friday December 19th. I was away from my office and unable to investigate and restore service until Saturday evening at 9:29:26 p.m. I admit to fretting during the day Saturday that my server had been victimized by an attacker, so I was surprised to discover the server powered off. Before I rebooted (from a "forensics" drive) to investigate further, I glanced at my wall clock and discovered that (a) we'd had a 7-minute power outage, and (b) the battery in my uninterruptible power supply didn't maintain power for this brief period.
According to the manual - yes, I kept the manual and actually knew where to find it - the UPS should have maintained power for at least 30 minutes with the nominal load of a single server. Ironically, my DSL modem and a hub connected to surge protection outlets on the same UPS carried on their service roles when power resumed, and predictably, this was the only of my 4 UPSs that failed. I then checked the bill of sale - yes, I kept this as well - and realized the battery was overdue for replacement.
My oversight isn't one large organizations tend to make, because service availability is one of the security metrics organizations take seriously enough to quantify: "no service" is commonly quantifiable in terms of lost revenue and productivity. It's also one of the easiest security measures to get funded, largely because administrators smartly budget availability under networking and performance rather than security.
Why do I insist service availability is a security metric? Why not ask, "Why do we worry about denial of service attacks?" Because all the security measures we might muster don't matter if data are not available.
Telephone companies established industry benchmarks for service availability, or more accurately, network equipment reliability. Telephone operations systems define this metric as:
mean time to restore (MTTR) + MTBF
Network (switching) equipment and operations systems must meet a stringent, five nines reliability criteria, or no more than 5 minutes of down time per year. Since this measure includes power supply, I decided I'd use this event to benchmark my service.
I began hosting web service from this server on March 24, 2003, at 11:10:10 a.m. If my math is correct, and rounding to hours, my MTBF was 6471 hours (268 days, 15 hours from service start to December 19th). My time to restore was a lame 43 hours and change, so service availability was ~99.339%. These calculations are overly simplistic, and don't take into consideration access circuit and firewall service outages, for example, but they are sufficient to illustrate the lessons I learned:
- For small businesses, the "un" in uninterruptible is as much a function of battery life as capacity. Keep track of UPS battery age.
- Avoid incidents of this nature by using a UPS that offers monitoring and alarm generation software, keeping a spare UPS in the office, and establishing a human backup when the administrator is away. These simple, inexpensive measures might have improved my availability to three- and even four-nines (approximately 1 hour of down time per year).
- As Gary Audin observes in his BCR article, Reality Check on Five-Nines , five-nines isn't necessarily the right availability benchmark for every organization. I serve about 500 requests per day. I'm not an e-merchant. Lives are not at risk if folks can't reach my blog or the handful of not-for-profit vanity sites I host for five minutes a year.
Perhaps the most important conclusion I can offer from my experience is this: know what business needs before you invest in redundancy, mirroring, and more. Learn how to assess your availability needs. You will find Matthew Liotine's Mission Critical Planning a valuable resource for determining the dollar value at risk when critical systems fail, and equally valuable in identifying methods for reducing or mitigating network performance and security risks. [I wrote the Forward to Matthew's book acting as consulting editor during the "work in progress" stage and found it very insightful.]

Comments