By Ravi Prakash, Product Manager
As a relatively new product manager to VI, I was intrigued to find that many customers from my days with NetApp or IBM are also using VirtualWisdom today despite having purchased SRM tools like NetApp OnCommand Insight or IBM Spectrum Control. You might well ask why this is so?
SRM tools complement the element manager that is included with your storage array and provide a single pane of glass to view storage capacity and trends across multi-vendor storage systems, performance monitoring of the storage (at a device, LUN, or File system level) and monitoring of the SAN fabric (Brocade or Cisco). SRM tools may also help with storage provisioning and may have tight integration with their own storage virtualization like IBM SVC or EMC VPLEX. Why is this not enough when it comes to solving application latency problems in mission critical systems? SRM tools poll storage arrays at regular intervals (usually every few minutes) and can potentially miss critical events occurring in the interim. They typically use native API access to their own storage arrays but SMI-S to access other vendor arrays. SMI-S interfaces typically provide less granular information than native APIs.
In today’s enterprise datacenter you might have a few hundred applications – some mission critical tier 0 applications (like an airline reservation system, ERP or CRM) and the rest may be tier 2 or 3 (like backup, file level replication). To complicate matters, each application may have half a dozen components – web tier, database tier, app tier etc. and these components may reside on different VMs which may be moved around across hypervisors in your datacenter. When a tier 0 app’s latency is impacted by the shared infrastructure (switched network and networked storage) also used by tier 2 and 3 applications you, as the storage admin, are faced with the daunting task of identifying which tier 3 app’s use of the shared infrastructure caused latency in the tier 0 app. An SRM tool measuring the round-trip time for an I/O from a host (monitored via an agent) to a LUN will be hard pressed to identify peripheral issues (like tier 2 or 3 apps using the shared underlying infrastructure) impacting this round-trip time.
In contrast, application-aware VirtualWisdom looks at every single conversation on the wire (whether it be using the Fibre Channel, NFS or SMB protocols) and quickly identifies the root cause of latency to tier 0 apps especially when it is due to underlying shared infrastructure. For instance, when a file level replication running on a server scans the NAS file system too frequently it could cause a tier 0 app using the same NAS to be adversely impacted. This is not something you can quickly identify using the polling method employed by SRM tools.
An analogy could be from air travel today where a weary passenger from the economy section of a flight (dare I use the term “cattle-car”?) decides to use the overhead bins in first class thereby violating the SLA guaranteed to the first class passenger who pays 5x the ticket price to be assured of free drinks, spacious overhead space and plenty of legroom. Like the friendly air steward who quickly and discreetly identifies the offending passenger, VirtualWisdom can identify root-cause for app latency in a tier 0 app so the infrastructure manager may take remedial action before tempers flare. Just as in the case of air-travel where the focus is on getting the flight to take off on time, in your datacenter your focus should be to ensure that internal application SLAs are met so the infrastructure group doesn’t have to deal with an unwelcome spotlight. Now you know why hundreds of enterprise customers who already run other SRM tools deploy VirtualWisdom right along with them.