January 12th, 2012
Archie Hendryx 2011 was a year where despite the economic constraints everything Big was seemingly good; Big Data, Big Clouds, Big VMs etc. Caught in the industry’s lust for this excess, 2011 was also the year I lost count of how many overprovisioned resources to ‘Big’ Production VMs I witnessed. More often than not this was a typical reaction from System Admins trying to alleviate their fears of potential performance problems to important VMs. It was the year where I began to hear justifications such as “yes we are overprovisioning our production VMs..but apart from the cost savings, overallocating our available underlying resources to a VM isn’t a bad thing, in fact it allows it to be scalable”. Despite this 2011 was also the year where I lost count of the amount of times I had to point out that sometimes overprovisioning a VM does lead to performance problems – specifically when dealing with Virtual CPUs.
VMware refers to CPU as pCPU and vCPU. pCPU or ‘physical’ CPU in its simplest terms refers to a physical CPU core i.e. a physical hardware execution context (HEC) if hyper-threading is unavailable or disabled. If hyperthreading has been enabled then a pCPU would consitute a logical CPU. This is because hyperthreading enables a single processor core to act like two processors i.e. logical processors. So for example, if an ESX 8-core server has hyper-threading enabled it would have 16 threads that appear as 16 logical processors and that would constitute 16 pCPUs.
As for a virtual CPU (vCPU) this refers to a virtual machine’s virtual processor and can be thought of in the same vein as the CPU in a traditional physical server. vCPUs run on pCPUs and by default, virtual machines are allocated one vCPU each. However, VMware have an add-on software module named Virtual SMP (symmetric multi-processing) that allows virtual machines to have access to more than one CPU and hence be allocated more than one vCPU. The great advantage of this is that virtualized multi-threaded applications can now be deployed on multi vCPU VMs to support their numerous processes. So instead of being constrained to a single vCPU, SMP enables an application to use multiple processors to execute multiple tasks concurrently, consequently increasing throughput. So with such a feature and all the excitement of being ‘Big’ it was easily assumed by many that taking advantage of such a feature by provisioning additional vCPUs could only ever be beneficial – but if only it was that simple.

The typical examples I faced entailed performance problems that were either being blamed on the Storage or the SAN and not CPU constraints especially as overall CPU utilization for the ESX server that hosted the VMs would be reported as low. Using Virtual Instruments’ VirtualWisdom I was able to quickly conclude that the problem was not at all related to the SAN or Storage but the hosts themselves. By being able to historically trend and correlate the vCenter, SAN and Storage metrics of the problematic VMs on a single dashboard it was apparent that the high number of vCPUs to each VM was the cause. This was indicated by a high reading of what is termed the ‘CPU Ready’ metric.
To elaborate, CPU Ready is a metric that measures the amount of time a VM is ready to run against the pCPU i.e. how long a vCPU has to wait for an available core when it has work to perform. So while it’s possible that CPU utilization may not be reported as high, if the CPU Ready metric is high then your performance problem is most likely related to CPU. In the instances that I saw, this was caused by customers assigning four vCPUs and in some cases eight to each Virtual Machine. So why was this happening?

Well firstly the hardware and its physical CPU resource is still shared. Coupled with this the ESX Server itself also requires CPU to process storage requests and network traffic etc. Then add the situation that sadly most organizations still suffer from the ‘silo syndrome’ and hence there still isn’t a clear dialogue between the System Admin and the Application owner. The consequence being that while multiple vCPUs are great for workloads that support parallelization but this is not the case for applications that don’t have built in multi-threaded structures. So while a VM with 4 vCPUs will require the ESX server to wait for 4 pCPUs to become available, on a particularly busy ESX server with other VMs this could take significantly longer than if the VM in question only had a single vCPU.

To explain this further let’s take an example of a four pCPU host that has four VMs, three with 1 vCPU and one with 4 vCPUs. At best only the three single vCPU VMs can be scheduled concurrently. In such an instance the 4 vCPU VM would have to wait for all four pCPUs to be idle. In this example the excess vCPUs actually impose scheduling constraints and consequently degrade the VM’s overall performance, typically indicated by low CPU utilization but a high CPU Ready figure. With the ESX server scheduling and prioritising workloads according to what it deems most efficient to run, the consequence is that smaller VMs will tend to run on the pCPUs more frequently than the larger overprovisioned ones. So in this instance overprovisioning was in fact proving to be detrimental to performance as opposed to beneficial. Now in more recent versions of vSphere the scheduling of different vCPUs and de-scheduling of idle vCPUs is not as contentious as it used to be. Despite this, the VMKernel still has to manage every vCPU, a complete waste if the VM’s application doesn’t use them!

To ensure your vCPU to pCPU ratio is at its optimal level and that you reap the benefits of this great feature there are some straightforward considerations to make. Firstly there needs to be dialogue between the silos to fully understand the application’s workload prior to VM resource allocation. In the case of applications where the workload may not be known, it’s key to not overprovision virtual CPUs but rather start with a single vCPU and scale out as and when is necessary. Having a monitoring platform that can historically trend the performance and workloads of such VMs is also highly beneficial in determining such factors. As mentioned earlier CPU Ready is a key metric to consider as well as CPU utilization. Correlating this with Memory and Network statistics, as well as SAN I/O and Disk I/O metrics enables you to proactively avoid any bottlenecks and correctly size your VMs and hence avoid overprovisioning. This can also be extended in considering how many VMs you allocate to an ESX Server and in ensuring that its physical CPU resources are sufficient to meet the needs of your VMs. As businesses’ key applications become virtualized it’s an imperative that whether they are old legacy single threaded workloads or new multi threaded workloads the correct vCPU to pCPU ratio is allocated. In this instance size isn’t always everything it’s what you do with your CPU that counts.
December 4th, 2011
Archie Hendryx Roll back several years and certain vendors had you believe that Fibre Channel was dead and that the future would be iSCSI. A few years later and certain vendors were then declaring that Fibre Channel was dead again and that the future was FCoE. So while this blog is not a iSCSI vs FC or FC vs FCoE comparison list (there’s plenty of good ones out there and both iSCSI or FCoE each have immense merit), the point being made here is that Fibre Channel unlike Elvis really is alive and well. Moreover Fibre Channel still remains the protocol of choice for most Mission Critical Applications despite the FUD that surrounds its cost, manageability and future existence. Most Storage folk who run Enterprise class infrastructures are advocates of Fibre Channel not only because of its high performance connectivity infrastructure but also due to its reliability, security and scalability. Incredibly this is all with the majority of Fibre Channel implementations being vastly under utilized, poorly managed (due to lack of visibility) and running at a far from optimized state due to the constant day to day operations of most SAN Storage administrators. Indeed if Storage folk were empowered with a metric that could enable them to gain a better insight and understanding of their SAN Storage’s performance and utilization the so called impending death of Fibre Channel may have to take an even further rain check. Well that metric does exist; cue what is termed the “Exchange Completion Time.”
It’s now common for me to visit customer environments that run Fibre Channel SANs yet have various factions that complain they are suffering performance issues due to lack of bandwidth or throughput, whether that’s server, VM, Network or Storage teams. In every single instance FC utilization has actually been incredibly low with peaks of 10% at the most and that’s with 4GB/s environments not 8GB/s! At worst there may be an extremely busy backup server that singlehandedly causes bottlenecks and creates the impression that the whole infrastructure is saturated but even these occasions are often rare. What seems to be the cause of this misconception is the lack of clarity between what is deemed throughput and what is an actual cause of bottlenecks and performance slow downs i.e. I/O latency.
Sadly (and I am the first to admit that I was also once duped), Storage folk have been hoodwinked into accepting metrics that just aren’t sufficient to meet their requirements. Much like the folklore and fables of Santa Claus that are told to children during Christmas, storage administrators, architects and engineers have also been spun a yarn that MB/s and IOPS are somehow an accurate determination of performance and design considerations. In a world where application owners, server and VM admins are busily speaking the language of response times, Storage folk are engrossed in a foreign vocabulary that revolves around RAID levels, IOPS and MB/s and then numerous calculations to try and correlate the two languages together. But what if an application owner requested Storage with a 10ms response time that the Storage Administrator could then allocate with a guarantee of that performance? That would entail the Storage engineer not just looking at a one dimensional view from the back end of the Storage Array but one that incorporated the comprehensive transaction time i.e. from the Server to the Switch port to the LUN. That would mean considering the Exchange Completion Time.
To elaborate, using MB/s as a measurement of performance is almost akin to how people used to count cars as a measurement of road traffic. Harking back to my days as a student and before all of the high tech cameras and satellites that now monitor road traffic, I was ‘lucky’ enough to have a job of counting the amount of cars that went through Trafalgar Square at lunchtime. It was an easy job, I’d see five cars and I’d click five times but this was hardly accurate as when there was a traffic jam and all of the lanes were occupied I was still clicking five cars. Here also lies the problem with relying on MB/s as a measurement of performance. As with the counting car situation a more accurate way would have been to instead watch each single car and measure it’s time from its origin to its destination. In the same vein, to truly measure performance in a SAN Storage infrastructure you need to measure how long a transaction takes from being initiated by the host, received by the storage and acknowledged back by the host in real-time as opposed to averages. This is what is termed the Exchange Completion Time.
While many storage arrays have tools that provide information on IOPS and MB/s to get a better picture of a SAN Storage environment and it’s underlying latency it’s also key to consider the amount of Frames per second. In Fibre Channel a Frame is comparable to a word, a Sequence a sentence and an Exchange the conversation. A Standard FC Frame has a Data Payload of 2112 bytes i.e. a 2K payload. So for example an application that has an 8K I/O will require 4 FC Frames to carry that data portion. In this instance this would equate to 1 IOP being 4 Frames and subsequently 100 IOPS of the same size equating to 400 Frames. Hence to get a true picture of utilization looking at IOPS alone is not sufficient because there exists a magnitude of difference between particular applications and their I/O size with some ranging from 2K to even 256K. With backup applications the I/O sizes can be even larger. Hence it’s a mistake to not take into consideration the amount of Frames/sec when trying to measure SAN performance or if trying to identify whether data is being passed efficiently. For example even if you are witnessing a high throughput in MB/s you may be missing the fact that there is a minimum payload of data and the Exchange (conversation) is failing to complete. This is often the case when there’s a slow draining device, flapping SFP etc. in the FC SAN network where instead of data frames causing the traffic you have a number of management frames dealing with issues such as logins and logouts, loss of sync or some other optic degradation or physical layer issue. Imagine the scenario, a Storage Administrator is measuring the performance of his infrastructure or troubleshooting a performance issue and is seeing lots of traffic via MB/s – unaware that many of the environment’s transactions are actually being cancelled across the Fabric!
This lack of visibility into transactions has also led to many storage architects being reluctant to aggressively use lower tiers of storage as poor I/O performance is often attributed to the storage arrays when often bottlenecks in the storage infrastructure are actually the root cause. Measuring performance via Exchange Completion Times enables measurement and monitoring of storage I/O performance, hence ensuring that applications can be correlated and assigned to their most cost- effective storage tier without sacrificing SLAs. With many Storage vendors adopting automated tiering within their arrays some would feel this challenge has now been met. The reality of automated tiering though is that LUNs or sub-LUNs are only dynamically relocated to different tiers based on the frequency of data access i.e. frequently accessed is more valuable so should reside on a higher tier and infrequently accessed data should be moved to lower tiers. So while using historical array performance and capacity data may seem a sufficient way to tier, it’s still too simplistic and lacks the insight for more optimized tiering decisions. Such an approach may have been sufficient to determine optimum data placement in the days of DAS when the I/O performance bottleneck was disk transfer rate but in the world of SANs and shared storage to look just at external transfer rates between SSD, Fibre Channel or SATA drives is a detached and inaccurate way to measure the effect of SAN performance on an application’s response time. For example congestion/problems in the SAN can result in severely degraded response times or cancelled transactions that fail to be acknowledged by the back end of the array. Furthermore incorrect HBA queue depths, the difference between sequential and random requests, link and physical layer errors all have an impact on response times and in turn application latency. By incorporating the Exchange Completion Time metric i.e. measuring I/O conversations across the SAN infrastructure into your tiering considerations, tiering can now accurately be based on comprehensive real time performance as opposed to device specific views.
Monitoring your FC SAN Storage environment in a comprehensive manner that incorporates the SAN fabric and provides metrics such as the Exchange Completion Time rapidly changes FC SAN troubleshooting from a reactive to proactive exercise. It also enables Server, Storage and Application administrators to have a common language of ‘response times’ thus eliminating any potential silos. With the knowledge of application I/O latency down to the millisecond, FC SAN Storage administrators can quickly be transformed from the initial point of blame to the initial point of resolution, while also ensuring optimum performance and availability of your mission critical data.
August 12th, 2011
Archie Hendryx Many VM admins are getting pretty good at estimating the number of VMs per physical server; but there’s still a lot of guesswork. Candidly, there’s plenty of room for better quantitative insight and analysis to really get the most payback on the investment in virtual services. VirtualWisdom can provide the insight to accurately determine the correct ESX to VM ratios prior to physical to virtual migrations.
For example, VirtualWisdom offers the ability to run ‘what If’ simulations which use real historical metrics and data to allow the end user to model and see the effects of potential configuration changes. As seen in the screenshot below, a modeling dashboard can be set up as an alternative to setting up a new lab or test environment to see how an application would perform on a virtual platform. Here we have chosen particular metrics of three separate physical servers that are running three separate SAP applications. Prior to migrating them, using the actual metrics of the applications, we have decided to run a modeling configuration which shows us exactly how those applications would perform once virtualized onto a single ESX server. This can then be historically tracked back to see how the proposed ESX server would have performed at different peak periods for that application, eliminating risk and making the most of the server’s resources. By being able to draw on historical metrics such as Disk I/O, MB/s, CPU utilization etc., such a modeling example can significantly reduce the risk of deploying Tier 1 applications onto vSphere.
For more information on this and other ways to increase vSphere usage with Tier 1 application, watch for our new whitepaper titled “Eliminating VMware / Storage Related Performance Challenges with VirtualWisdom.”

May 31st, 2011
Archie Hendryx In the past week I’ve had two customers mention how they are lacking / need an “End to End Awareness” for their environment. They both mentioned how their Host and SRM tools are device specific and while great in some respects they failed to provide a comprehensive view of their environment’s performance.
This drew me back to my own days as an end user when all the SMI-S compliant tools that were at my disposal gave me wonderful topologies, capacity planning features and end to end views but failed to provide the ‘awareness’ on performance I craved. Worse still I was often guilty of still zoning and provisioning with the legacy SAN switch management tool or the Storage Array Management Console, despite all the APIs that were running in my heterogeneous environment to give me that ‘single management pane’. The simple reason was despite all the management capabilities, I was concerned that I still needed the legacy tools to get some detailed picture of what impact my changes would have on the environment’s performance. In hindsight even this wasn’t good enough as I was depending on averaged out / polling intervals that gave me metrics which were unable to go the millisecond granularity I needed.
Hence another one of my personal conundrums as a Solutions Consultant for Virtual Instruments: Our solution offers the Awareness of performance that allows you to see every single I/O from HBA to Switch port to LUN that complements the SRM and device specific tools that already exist. We are able to measure every single FC transaction down to the millisecond. So while it’s great that I am now able to explain to customers this unique solution that provides the granular End to End Awareness of performance that I also personally craved, I’m now no longer an end user and hence can’t take advantage of the platform myself!
April 16th, 2011
Archie Hendryx In this last week of customer visits I was astounded to have the above sentence said to me on two separate occasions by two separate companies. What really took me aback was that this is exactly how I felt prior to joining Virtual Instruments as a Solutions Consultant six months ago. It’s a bold claim but one I will certainly stand by and challenge to prove to anyone (feel free to send me a PM on LinkedIn!).
For example at a recent POV engagement with a VP of Operations, we demonstrated how Virtual Instruments’ solution had pinpointed problems with ports that were connected to their critical Datawarehouse environment as well as their tape back up library. We were able to conclude that the issue related to slightly damaged cables and that replacing them would solve a lot of the performance problems they had faced with their Datawarehouse and Backups. It was at this point that the VP of Operations concluded with the words, “I get it! We’ve been investing in hardware worth more than a million pounds to bring the performance latency of our Datawarehouse down by 10 milliseconds when all the time I just needed to replace a cable!”
Another compelling example I faced this week further showcased how our solution emanates from the ‘weeds’ of the cable to the criticality of a company’s business continuance. Speaking to the customer’s Storage Engineer, he described how VI’s VirtualWisdom had reported occasional Loss of Sync on several ports that are part of the corporation’s critical DR and replication process. He explained how he went to the datacenter to see if the ports were okay whereupon his arrival one of the SFPs failed at that very moment in front of his own eyes. He was able to quickly replace the SFP, hence avoiding any outages or performance degradation.
- On one level the Storage Engineer had proactively remediated an issue that would have left him with an immense number of sleepless nights, problem fixing on the SAN and replication process.
- On the application level the performance degradation was avoided and its availability was maintained.
- On the business level, which has invested millions of dollars in ensuring business continuity in the event of any disaster or outage, this simple remediation allowed them to proceed without disruption.
It would be interesting to hear others and their examples and experiences of such incidences. One of the best kept secrets in the IT World? Not for long I guess!