• SAN Best Practices
January 4th, 2012

An experiment in showing the effect of Queue Depth Settings

Ron Lee

Over the past few weeks I’ve been settling into my new role as technical marketing manager for Virtual Instruments. One of the things I’ve studied, with the help of our professional services group, is the Queue Depth Settings parameter.

For those unfamiliar with this, the parameter can be set on fiber channel interfaces, and it defines how many commands can be run in parallel on an interface and is a balance of throughput and performance. We conducted a simple test in our lab to see how throughput on an interface and, more importantly, the exchange completion time for read and write operations behave. Exchange Completion Time is defined as the end-to-end time it takes for a read or write operation to complete. It measures the effect of the SAN on application latency, and is an excellent measure of performance on an interface.

Our test setup was fairly simple. We used a load generator on a server that was connected to several LUNs. We ran various sized read and write loads simultaneously on the LUNs. We then varied the queue depth setting on the interface to 4, 8 and 32. During the test, we collected data using the switch data collection capabilities of our VirtualWisdom solution.

Here is the VirtualWisdom Dashboard showing data collected by the VirtualWisdom SAN Performance Probe. The bottom plot shows the queue depth setting, which goes from 4 to 8 and then to 32. The middle plot shows our total throughput. The initial step is where the load generator starts up. When we change the queue depth setting from 4 to 8, we see there is a 25% jump in throughput. If we advance the time scale a bit, we see that at a queue depth setting of 32, there is only a 5% gain in throughput.

What is more interesting is the top graph, as it plots the exchange completion time for read and write transactions. On the very left where the queue depth is 4 ECT it is around 1ms for both read and write. When we raised the queue depth to 8, the ECT is around 4-5ms. The most interesting part is when we raise the queue depth to 32. In this test, the ECT increased to maximum of 200 ms or 40 times (4000%), and became unstable with wide variations. This can cause problems because random read and write transactions will be very slow. If transactions time out or fail, it will be very hard to find this problem. Some transactions will be slow and others will be ok depending on their luck.

So what have we learned here? A port with a misconfigured queue depth setting can be a hidden problem that will not surface until you have a significant load on a port. We see how VirtualWisdom can detect this kind of problem. A best practice would be to look at ECT on ports and see if the ECT is stable and consistent. If not, some proactive tuning would be a good idea so you can avoid trouble in the future.