Blogs Virtual Instruments Blogs
Who’s Your Go Daddy?
Yesterday GoDaddy issued this release, which explains intermittent service outage from earlier this week. I’m not picking on GoDaddy – we all have an IT services provider, but who is yours? If you’re a user of private or public cloud services, ever wonder what you should ask your provider about what they’re doing to prevent this sort of issue? Or if you are the provider, wonder what you can do to avoid being in such a position?
To no one’s surprise, network and storage infrastructures are often the culprit in application slowdowns or outages, and it was a network router that reportedly caused GoDaddy’s problem. It’s difficult to say what the actual cause was, but there are some specific things any provider can do to help prevent infrastructure problems.
First of all, what tools are you using to measure your infrastructure performance? How “real time” are they? Chances are, even with those tools, your first indication of trouble is when a user reports it. At best, the infrastructure-monitoring product you are most likely currently using reports IOPS or MB/s, which are only high-level measures of utilization, not performance.
If you care about application performance and availability (which every provider should given the negative impacts demonstrated by recent outages), you should measure infrastructure performance by how it effects application response time for every transaction. You must also measure infrastructure latency. Attempting to infer the performance of an infrastructure from the behavior of resource utilization statistics will simply not work. It will yield a blizzard of false alarms (spikes in resource utilization that are not impacting application performance) and it will consistently miss problems in infrastructure latency that do not show up in resource utilization metrics.
Second, how frequently are you capturing performance data? Your current tools probably poll infrastructure components every few minutes. The problem with this approach is that attempts to manage infrastructure by relying upon infrequent polling (every 5 minutes, or even every 30 minutes) and averaging of commodity data that is worthless when it comes to assessing the actual performance of an infrastructure in support of its applications and workloads. In other words, you miss important events. Unfortunately, many vendors insist on referring to this polling and averaging technique as “real time.” It’s not. Real-time monitoring must deliver data at one second intervals, measured to the fraction of a millisecond, and captured from the physical layer, at the protocol level, at line rate.
And third, how are you correlating and analyzing the data you collect? Your current tools probably poll each infrastructure component separately. Today’s IT infrastructures are complicated, interrelated beasts. If you haven’t participated in a vendor “blame fest,” you’re pretty unique. Only by examining the entire data path and all related components as a whole, can you be assured of finding the true root causes of problems or potential problems.
With VirtualWisdom, companies are able to measure the response latency of the infrastructure to requests on the part of the workloads, across every single request, in real time (as the transaction occurs, not some time thereafter), based upon the actual latency information, not an average. VirtualWisdom directly measures the I/O performance effect (latency, bandwidth, re-tries, errors, etc.) on the infrastructure of capacity additions, configuration changes, component/device changes, or application changes. With this Infrastructure Performance Management technique our customers are able to proactively prevent problems, improving their overall application response times and most importantly, end-user satisfaction.
For more detail on what I’ve just brushed on, I encourage you to read the paper from Bernd Harzog, CEO of APM Experts on the subject: “Constructing a Reference Architecture for Infrastructure Performance Management.”