Blogs Virtual Instruments Blogs

VMware makes the case for SAN-aware performance monitoring

Server virtualization technologies for Open Systems have been around for more than a decade now with VMware as the clear market leader. In the beginning it was widely adopted for development and testing. As the technology evolved and additional functionalities like VMotion for live migrations of virtual machines from one host system to another got introduced, VMware began to be regarded as production-grade. Enterprises have started to deploy their business critical applications on VMware and by the same token, the requirements for their virtual infrastructures in respect of availability and performance have risen dramatically.

VMware publishes many availability and performance related technical papers in its Technical Resource Center on http://www.vmware.com/resources/techresources/ which provide excellent help and guidance on architecting reliable virtual infrastructures. When reviewing these documents it gets clear that out of the whole system stack, the storage tier is the key component for performance. A very good evidence for this is stated in the database related whitepaper “Oracle on VMware vSphere Essential Database Deployment Tips” (http://www.vmware.com/resources/techresources/10101).  In the introduction on page 1: “This paper also takes a proactive approach to addressing performance issues. At VMware, greater than 90 percent of the performance issues encountered by our customers were due to configuration errors at the storage tier. For this reason, a significant portion of the paper will deal with the storage tier.”

Server virtualization solutions like VMware bring an additional layer to the whole infrastructure and though the benefits due to server consolidation are very obvious, it also adds extra complexity. When it comes down to storage configuration tasks the complexity arises: a VMware Datastore is an abstraction of the storage tier and therefore it’s a logical and not a physical representation. Many Virtual Machines might share one Datastore and in the same way many Datastores might share the same physical disks in one storage array. It’s only a question of time that severe congestion will appear and affect performance dramatically if there is no sound end-to-end planning and design of the whole infrastructure.

Unfortunately, VMware doesn’t provide a clear sight into the I/O path down to the storage array and LUN, as necessary. The “Scalable Storage Performance” whitepaper (http://www.vmware.com/resources/techresources/1059) concludes (page 10): “A virtualized environment makes effective use of available resources, but at the same time it can impose more load on the storage infrastructure because of increased consolidation levels. An I/O command generated in a virtualized environment must pass through extra layers of processing that enable all the useful features of virtualization. It is important to understand the potential bottlenecks at various layers and make the necessary configuration changes to get optimal storage performance.” In the same paper the factors affecting the scalability of storage in ESX environments are stated (page 2): “Our tests explored three key factors that affect the scalability of storage in an ESX environment—the number of active commands, SCSI reservations, and total available link bandwidth.”

One of the most interesting statements I’ve found is from the Oracle Deployment Tips Paper mentioned above: “Tip 17: Optimized Architectures are Not Designed in Silos. At a minimum, designing the optimized architecture should involve the database administrator, storage administrator, network administrator, VMware administrator, and application owner.” This clearly testifies that there is the need for a holistic view on the entire infrastructure, and that tools which should support this, have to collect and correlate metrics from all tiers in the system stack. And as we see that the vast majority of the business critical deployments with VMware are on Fibre Channel storage, VirtualWisdom is the only solution that collects data from the physical layers in real-time. By collecting performance metrics from the Fibre Channel SAN using cable splitters, VirtualWisdom doesn’t affect the production with additional load and latencies, and by correlating this data with the information from the VMware layer, it brings an enormous benefit for the administrators and a solid foundation for their cross-domain work.

One Response to “VMware makes the case for SAN-aware performance monitoring”

  1. Greg Phillips says:

    This is good however, I wish VMware would make more mention of the help in this regard that NPIV provides. They implemented that feature in ESX 3.5 if you’re using RDM and have some very good whitepapers (you have to dig for them though) written with vendors on leveraging NPIV.