Blogs

Cisco STS & VirtualWisdom enable monitoring of 50x application conversations per Cisco switch vs. comparable solutions

By Ravi Prakash, Product Manager


The latest PR from Brocade around the new Brocade FC32-64 port blade for Brocade X6 may cause you to wonder how this contrasts with the Cisco MDS 9700 48-port 32 Gbps Fibre Channel Switching Module  especially in the context of Cisco SAN Telemetry Streaming (STS) which is supported by VirtualWisdom vs Brocade® IO Insight which is part of Brocade Fabric Vision.  To do this let’s start by getting some terminology out of the way.

You’ll agree that applications have conversations, we call a conversation an Initiator Target LUN (ITL) to account for the fact that we monitor the conversation end-to-end from the server through the SAN down to SAN-attached networked storage.

Brocade IO Insight

Brocade Gen6 Fibre channel switches have an ASIC that provides metrics on ports of Gen 6 switches that are connected to hosts or to storage targets. These metrics include First Response Time, Command Completion Time.  This capability to monitor workloads is called IO Insight and is part of  Brocade Fabric vision

Brocade FOS 8.2.0 release notes state: FOS v8.2 supports IO Insight metrics with system pre-defined flow sys_mon_all_fports. With sys_mon_all_fports flow, 2047 sub-flow are supported on X6 directors, and 511 sub-flow are supported for G620/G630 switches.”

Per the Brocade “SAN Fabric Resiliency & Administration Best Practices” a flow is a collection of Fibre Channel frames that share certain traits (like source device, destination device, ingress port or egress port).  This implies that on a Brocade X6 director switch you can monitor 2047 flows at a port level (where every Fibre Channel port is a flow).  You would get IOPs, response time, time-to-first data metrics aggregated to the port level.  However, with this level of granularity you couldn’t answer questions like:

“To which storage array are these metrics relevant? (as it would apply to all arrays)

“To what LUN is this traffic relevant”?  (as it would apply to all LUNs)

The other caveat is that you’d see only aggregates (and not a persistent view) of all the traffic.

If you monitor by application “conversations” you may say:

“I want to monitor all flows from my Oracle Host with 2 HBAs, each HBA talking to 4 ports on my VMAX each having 10 LUNs” 

You run into a few issues:

  • A limit of 2047 flows being monitored simultaneously puts you in the hot seat to decide in advance which of the flows you want to monitor.
  • In the Brocade UI, you must select the initiator and destination by FCID and LUN number by its internal LUN number. Most human beings don’t think in terms of FCID or LUN numbers.  That is the reason why Virtual Instruments designed VirtualWisdom such that it does its own discovery and associates FCIDs with hosts.  This dramatically simplifies SAN monitoring deployments.

Why is persistent monitoring at the ITL level necessary?

Consider the scenario depicted above:

A host running Solaris has 4 Fibre Channel Host Bus Adapter (HBA), each HBA is connected to an edge switch, each edge switch connects to core switches, each core switch connects to a storage edge switch and eventually to ports on an IBM SAN Volume Controller cluster with 6 nodes.  As you see above, each of the HBAs is zoned to 12 storage ports.  The host itself is mapped to 15 storage LUNs.  This implies that your ITL count will be:

4 HBA * 12 storage ports * 15 LUNs = 720 ITLs

With the Brocade solution, to monitor this one host, you will end up consuming 720 (out of 2047 available) ITLs.  This means you can realistically expect to monitor only 2-3 servers with this 2047 ITL limit.  How many enterprise data centers have less than 4 servers?

You may say this example portrays an extreme case and that in most cases you may have 2 HBAs per server (not four) and each HBA may be mapped to 4 storage ports (not 12).  However, if your host has 100 LUNs mapped to it then your ITL count will be:

2 HBAs * 4 storage ports * 100 LUNs = 800 ITLs.

Whichever way you look at it, a 2047 ITL limit is unrealistic and you have the burden of deciding which 2 or 3 servers out of 1000s in your datacenter needs to be monitored.

To compound it when you start monitoring these 3 servers, you have no baseline behavior data to compare against what the behavior looked like a week ago.  Hence persistent monitoring of a large-scale environment is the real need.

Cisco SAN Telemetry Streaming & VI VirtualWisdom – a vastly superior solution

Cisco SAN Telemetry offload ASICs in the MDS 9700 48-port 32 Gbps Fibre Channel Switching Module have 32 G FC port ASICs which perform deep packet inspection of frames, collect FC and SCSI headers and send it to a Network Processing Unit (NPU) which is a high performance packet processor which handles correlation and brings out storage centric metrics.  The NPU streams these metrics to VirtualWisdom, the external analytics and visualization engine for this joint solution.  Cisco is planning to support 100,000 ITLs per MDS switch.  The Cisco STS solution provides ITL level monitoring without any physical Taps but at a lower TCO than hardware Performance probes.  The Cisco STS & VirtualWisdom solution gives you the ability to monitor 50x the number of ITLs per switch versus a comparable Brocade IO Insight solution.

Why use VirtualWisdom with Cisco STS? 

VirtualWisdom combines wire data with data from the management interface on MDS, health/utilization data for FC and FCoE, data from hosts running vSphere, Hyper-V, PowerVM, and data from vSAN or ScaleIO all collected/correlated/analyzed by VirtualWisdom into an application-centric view – for information relevant to an application-owner or business unit.

For instance, the following topology view shows an alarm on a production UCS host.  Drilling down on this alarm shows you which applications are impacted.

Alarms are tied to cases which are tied to Investigations which invoke the right analytics like “Event Advisor” and “Trend Matcher” to help you get to root cause.  Event Advisor helps you avoid impacting your application SLAs by identifying unusual behavior in your workloads.

Trend Matcher understands causality between trends and helps you find root cause for alarms.  For instance, you could use Trend Matcher to find the root cause of delays in writes to your networked storage.

In conclusion, VirtualWisdom, when used with Cisco STS, gives you a wire level monitoring of conversations but without the need for physical Taps or hardware performance probes but at 50x the number of conversations you can expect to monitor with a comparable Brocade solution.  Like to learn more? Give us a call!