By Ravi Prakash, Product Manager
With crude oil prices reaching $75 a barrel in July 2018, an indicator of future output is the number of active US rigs drilling for oil and this number rose to 863. 
If you manage datacenter infrastructure for a company focused on oil & gas exploration, you may be supporting geoscientists using a range of applications. Let us assume your geoscientists select Schlumberger as your vendor of choice for oil & gas applications, you might have Petrel & GeoFrame for geological interpretation and reservoir modeling, Eclipse and Petrel for reservoir simulations. You pick NAS from DellEMC Isilon or NetApp for your networked storage platform and expose client access via protocols like NFS and CIFS.
With NAS now in place, what happens when 100s of clients are accessing the shared NAS and there is a rogue client taking up more than its share of resources? Does this scenario sound familiar?
There must be another way! Our application-centric IPM platform VirtualWisdom excels at identifying root-cause for intermittent performance issues which are difficult to get to root-cause using conventional monitoring tools!
VirtualWisdom collects metrics from your infrastructure, contextualizes it using entity models then predicts incidents, recommending ways to diagnose and resolve problems. Data from compute (vCenter, Hyper-V, AIX), networking, shared storage – is collected and correlated by VirtualWisdom.
Our NAS Performance Probe, aka ProbeNAS, monitors at 4 levels:
ProbeNAS monitors NFSv3 and SMBv2 traffic out-of-band on 16 Ethernet links at 10GbE line rate. It does no sampling, instead it captures every read/write operation at line rate and provides an unaltered I/O profile of actual traffic. VirtualWisdom starts by ingesting data for a week, identifies a normal baseline for your unique environment and alarms when there is deviation from this baseline.
With VirtualWisdom we can match say 100% CPU utilization with slow response times observed by the user. If our performance probe notices PAUSE frames, it could be a symptom of flow control problems in the Ethernet network. Unlike monitoring products which stop with generating alarms, we include a run-book style automation where we tie alarms to cases, cases to investigations, investigations to purpose-built-analytics and get you to the stage where all you need to do is generate an internal change control ticket based on our recommendation of root-case!
You may wonder why can’t you do this level of monitoring using native tools like InsightIQ that comes with Isilon NAS? InsightIQ polls performance metrics from monitored Isilon clusters every 15 seconds over the OneFS API. In doing so it has no visibility into millisecond level per-conversation issues causing intermittent performance issues that a wire level monitoring solution like VirtualWisdom would have. In addition, Isilon recommends that InsightIQ not be used to monitor a cluster with over 80 nodes, which is not uncommon in the exploration industry. With seismic data growing exponentially can you guarantee that your storage needs will always stay under 80 nodes? VirtualWisdom is agnostic to such limitations and would provide the same level of visibility regardless of whether your target NAS is from DellEMC, NetApp, HDS, Qumulo, Microsoft/Avere or Nasuni.
You may ask why you can’t do this with SRM tools like NetApp OnCommand Insight (OCI). One reason is that OCI has a 5-minute frequency for collecting performance data. This is a far cry from the per second monitoring of every application conversation done by VirtualWisdom which is essential if you want to identify intermittent application performance issues.
While nearly all SRM and IPM tools are blind to applications, VirtualWisdom is application-aware. VirtualWisdom detects what constitutes an application by monitoring inter-VM traffic using NetFlow generated by vSphere Distributed Switch. If your application components run on bare-metal, we can detect your applications using AppDynamics or ServiceNow or even using SSH/WMI to query process tables. This unique ability to tie application-awareness to infrastructure monitoring up-levels the discussion to what you really care about which is: Which application’s use of shared infrastructure is impacting the performance of my most critical applications?
Like what you hear and want to learn more? Give us a call!