This Virtual Instruments customer is one of the world’s great consumer goods companies, with over 150,000 employees in 100 countries supporting over 300 brands spanning over 10 categories of home, personal care and foods products. This company’s brand portfolio has made it a leader in every field in which it works.
This company has a 12 petabyte SAN growing at 50% annually connected to over 3,000 servers, and 12,000 Brocade SAN ports, supported by 8 full-time-equivalent (FTE) engineers. The SAN uses 32 IBM SVC clusters on the UNIX servers, which allows them to standardize the storage path and present a unified storage stream. VMware is a standard in the MS-Windows server environment, though there is a movement to Microsoft’s Hyper-V. The most mission-critical applications use over 40 IBM DS8000s for tier 1 storage, with SVCs for virtualization, driven by IBM pSeries servers running AIX and VIO. Other storage is supplied by HP, with a small number of XP and 50 EVA systems. Other SAN management tools include HP Storage Essentials for spindle-level analysis and IBM’s TotalStorage Productivity Center (TPC) to help manage assets through provisioning and capacity utilization. BMC and other host-based tools are used for MS-Windows server monitoring. The highly complex state of the art and legacy environment supports one of the world’s largest SAP implementations.
The IT function is a key enabler for the transformation towards a globally aligned business through strategic alliances and partnerships with global suppliers, improving IT infrastructure and service levels while reducing costs, building consistent IT capabilities, processes and databases, and strategic outsourcing in selected key areas. To help ensure success, IT performance is constantly measured against industry best practices, with assistance from Gartner and other analysts.
One of their key challenges was in the legacy SAN fabric. With a large number of switches, lack of standardization, inability to see down to the fibre layer for problem identification, and insufficient utilization data, it was very difficult to manage the SAN.
In a highly virtualized environment, the number one issue was performance, as virtualization can often negatively impact performance and debugging virtualized environments is extremely difficult. Related to performance was the effort to avoid not only application slowdowns, but outages. In addition, behind the virtualized environment, it was nearly impossible to accurately account for SAN fabric utilization, or more specifically, how fabric utilization affected their business application performance, primarily SAP.
Behind these issues remained the desire to measure IT against industry best practices, but existing measurement frameworks were very subjective, at best.
The company identified a requirement as part of a continuous service improvement process to rework a significant portion of their SAN. Included in the rationalization project was the desire to instrument the SAN to understand and proactively manage exactly what is happening in the SAN and in the virtualized data center. The company is a heavy user of storage virtualization.
The volume of work was a significant challenge for the very lean and highly cost-effective SAN team, but the benefits of making the changes were significant. With five petabytes of data to manage and predicted annual storage growth approaching 50%, the company’s IT team began by standardizing on new switches and servers builds, and swapping out old ports. At the same time, they knew they needed to better manage multi-pathing, both from performance and availability perspectives.
To provide a measurement against industry best practices, the company embraces the Enterprise Strategy Group’s (ESG) Virtual Instruments Storage Maturity Model, a best practices methodology of measuring the progress towards an optimized state of storage efficiency.
Like many IT shops, they were challenged with tight budgets and with getting more performance from existing resources. To do this, IT knew it must be proactive; it had to avoid problems and anticipate issues before the application owners felt the pain. The only way to do this without dramatically increasing the staff size was to find some way
to proactively automate the monitoring and analysis of the effect of the SAN on application availability.
HP introduced them to the VirtualWisdom product. The company then worked with Virtual Instruments on a preliminary consulting engagement, which showed exactly what was happening on the fabric. They looked at other proprietary tools and the SAN management tools from their system and storage vendors, but settled on a list of priorities that led them to begin deployment of VirtualWisdom in June, 2009. The company wanted:
The company consolidated the fabric and standardized the build environment to reduce the number of variables, but still lacked granular measurement tools that could look behind the virtualized storage cloud. In order to assert new levels of control, they turned to Virtual Instruments’ VirtualWisdom in June 2009 to provide their IT organization with an unparalleled view of the current system status as well as the ability to accurately pin-point problem areas before they resulted in frustrated users or costly downtime.
Improved performance, reduced risk of outages
To begin with, Virtual Instruments VirtualWisdom software probes used SNMP gathered metrics to point out opportunities to improve the multi-path coverage and path balancing, offering improved performance, and reducing the risk of outages. The following summary report is representative of initial findings. Among other things, it shows that in 8% of the environment, the data path was not redundant, and in 13%, the data flow in the redundant paths was not balanced.
Reduced trouble tickets, huge OPEX saving
Following the software deployment, hardware/SAN Traffic Access Points (TAPs) were deployed where all IBM SVCs were deployed. Within a quarter of first deployment, they saw a 75% drop in trouble tickets, primarily due to the ability to proactively detect small problems before they became big, business-impacting problems.
At the same time, problem analysis times were dramatically reduced. For instance, it might have taken as long as two weeks to find the actual cause of a backup slowdown, but with VirtualWisdom, the root-cause analysis became almost instantaneous.
With a comprehensive view of actual utilization and performance metrics enabled by VirtualWisdom, the company can continue to manage its high data volume growth with an almost-flat headcount in the SAN team. Furthermore, the team is already confident at overcoming other traditional problem areas such as reducing the number of devices within the SAN, accurately forecasting capacity requirements, consolidating and decommissioning legacy servers, and qualifying known areas of opportunity in relation to storage ports.
When compared to other large customers (examples listed below), this IT organization manages its storage with on average 1/4 of the staff.
If you assume a fully burdened cost of an admin to be $150K/year, then this organization is recognizing an OPEX saving of $2.7M per year.
Having an infrastructure that is up and running 24/7/365 is essential. VirtualWisdom is able to help the IT staff identify issues before they escalate into a potential application outage. This company estimates that a 24 hour outage on their critical applications could cost them in the region of $70M or $48K/minute!
CAPEX savings
Additionally, by leveraging VirtualWisdom’s unique ability to identify the I/O workload from Initiator to Target LUN, they are able to drive greater ratios of LPAR’s across their IBM P-Series cluster that they use for their critical SAP production environments.
Within six weeks of VirtualWisdom installation, the IT team was already identifying repetitive and potentially costly problems such as specific switch bottlenecks, replicated fabric balancing issues and virtual library testing inconsistencies. As a side benefit, VirtualWisdom also provides them with the transparency to drive resolutions with its primary storage vendors, a task that was previously complicated due to the complex architectures involved and the lack of visibility into the overall SAN.
Specific Virtual Instruments benefits to the company include:
Virtual Instruments is aligned with corporate IT strategy of standardization and simplification. The IT function is a key enabler for the transformation towards a globally aligned business through strategic alliances and partnerships with global suppliers.
The company is progressing along the Storage Maturity Model and expects to reap even greater benefits as it gains more experience with VirtualWisdom. The goal is to deepen the proactive monitoring capability, with tighter thresholds and increased dynamic alerting. Additionally, the team plans to instrument the Test Lab to create a more dynamic environment for testing and deploying solutions, and to prove out vendor performance claims.
In retrospect, the company found that it’s the fact that VirtualWisdom is a vendor agnostic solution that helps keep vendors honest. But their vendors actually recommended VirtualWisdom, so perhaps it’s good to gauge your vendors’ acceptance of a “referee” like Virtual Instruments. They found significantly over-provisioned SANs in their infrastructure where utilization rates were surprisingly low. Thanks to Virtual Instruments, they were able to indentify consolidation opportunities and expect to save millions by avoiding or delaying purchases of unnecessary SAN equipment and reduced environmental costs (power, cooling floor space) over the next few years. At the same time, the company was able to identify opportunities to distribute the load of busier data paths over more links, to more effectively distribute the load. Neither SNMP based tools, nor “rules of thumb” were able to provide effective help here.
Since the company has a redundant SAN infrastructure, it was easy to add TAPs (signal splitters) during maintenance windows with no effect on users. Having said that, it would be better to provide Virtual Instruments-approved TAPs when the initial SAN is deployed, and make it a corporate standard. The incremental cost is small, and the benefits are huge.