Global Consumer Products Company

Using VirtualWisdom to guarantee IT infrastructure that supports over 300 major brands

Customer Story and Case Study

Overview

This Virtual Instruments customer is one of the world’s great consumer goods companies, with over 150,000 employees in 100 countries supporting over 300 brands spanning over 10 categories of home, personal care and foods products. This company’s brand portfolio has made it a leader in every field in which it works.

IT Environment and Role

This company has a 12 petabyte SAN growing at 50% annually connected to over 3,000 servers, and 12,000 Brocade SAN ports, supported by 8 full-time-equivalent (FTE) engineers. The SAN uses 32 IBM SVC clusters on the UNIX servers, which allows them to standardize the storage path and present a unified storage stream. VMware is a standard in the MS-Windows server environment, though there is a movement to Microsoft’s Hyper-V. The most mission-critical applications use over 40 IBM DS8000s for tier 1 storage, with SVCs for virtualization, driven by IBM pSeries servers running AIX and VIO. Other storage is supplied by HP, with a small number of XP and 50 EVA systems. Other SAN management tools include HP Storage Essentials for spindle-level analysis and IBM’s TotalStorage Productivity Center (TPC) to help manage assets through provisioning and capacity utilization. BMC and other host-based tools are used for MS-Windows server monitoring. The highly complex state of the art and legacy environment supports one of the world’s largest SAP implementations.

The IT function is a key enabler for the transformation towards a globally aligned business through strategic alliances and partnerships with global suppliers, improving IT infrastructure and service levels while reducing costs, building consistent IT capabilities, processes and databases, and strategic outsourcing in selected key areas. To help ensure success, IT performance is constantly measured against industry best practices, with assistance from Gartner and other analysts.

Challenges and Concerns

One of their key challenges was in the legacy SAN fabric. With a large number of switches, lack of standardization, inability to see down to the fibre layer for problem identification, and insufficient utilization data, it was very difficult to manage the SAN.

In a highly virtualized environment, the number one issue was performance, as virtualization can often negatively impact performance and debugging virtualized environments is extremely difficult. Related to performance was the effort to avoid not only application slowdowns, but outages. In addition, behind the virtualized environment, it was nearly impossible to accurately account for SAN fabric utilization, or more specifically, how fabric utilization affected their business application performance, primarily SAP.

Behind these issues remained the desire to measure IT against industry best practices, but existing measurement frameworks were very subjective, at best.

Technology and Infrastructure Initiatives

The company identified a requirement as part of a continuous service improvement process to rework a significant portion of their SAN. Included in the rationalization project was the desire to instrument the SAN to understand and proactively manage exactly what is happening in the SAN and in the virtualized data center. The company is a heavy user of storage virtualization.

The volume of work was a significant challenge for the very lean and highly cost-effective SAN team, but the benefits of making the changes were significant. With five petabytes of data to manage and predicted annual storage growth approaching 50%, the company’s IT team began by standardizing on new switches and servers builds, and swapping out old ports. At the same time, they knew they needed to better manage multi-pathing, both from performance and availability perspectives.

To provide a measurement against industry best practices, the company embraces the Enterprise Strategy Group’s (ESG) Virtual Instruments Storage Maturity Model, a best practices methodology of measuring the progress towards an optimized state of storage efficiency.

Solution Evolution

Like many IT shops, they were challenged with tight budgets and with getting more performance from existing resources. To do this, IT knew it must be proactive; it had to avoid problems and anticipate issues before the application owners felt the pain. The only way to do this without dramatically increasing the staff size was to find some way

to proactively automate the monitoring and analysis of the effect of the SAN on application availability.

HP introduced them to the VirtualWisdom product. The company then worked with Virtual Instruments on a preliminary consulting engagement, which showed exactly what was happening on the fabric. They looked at other proprietary tools and the SAN management tools from their system and storage vendors, but settled on a list of priorities that led them to begin deployment of VirtualWisdom in June, 2009. The company wanted:

  • An agnostic, vendor-neutral solution that would enable them to avoid finger pointing and keep all the vendors honest
  • A solution that could trace down slow draining devices. Legacy vendor tools start at the spindle level. For instance, they found badly formed packets, but the fabric reacted differently each time to the bad packets.
  • A solution that could easily validate pathing in their multi-pathing environment
  • A solution that would give them both real-time monitoring in addition to the ability to collect historic/ trending data for problem troubleshooting and for resource planning
  • A solution that could be easily implemented in stages, allowing them to meter the degree of sophistication in their management architecture, as resources allowed

Benefits of the VirtualWisdom Solution

The company consolidated the fabric and standardized the build environment to reduce the number of variables, but still lacked granular measurement tools that could look behind the virtualized storage cloud. In order to assert new levels of control, they turned to Virtual Instruments’ VirtualWisdom in June 2009 to provide their IT organization with an unparalleled view of the current system status as well as the ability to accurately pin-point problem areas before they resulted in frustrated users or costly downtime.

Improved performance, reduced risk of outages

To begin with, Virtual Instruments VirtualWisdom software probes used SNMP gathered metrics to point out opportunities to improve the multi-path coverage and path balancing, offering improved performance, and reducing the risk of outages. The following summary report is representative of initial findings. Among other things, it shows that in 8% of the environment, the data path was not redundant, and in 13%, the data flow in the redundant paths was not balanced.

Reduced trouble tickets, huge OPEX saving

Following the software deployment, hardware/SAN Traffic Access Points (TAPs) were deployed where all IBM SVCs were deployed. Within a quarter of first deployment, they saw a 75% drop in trouble tickets, primarily due to the ability to proactively detect small problems before they became big, business-impacting problems.

At the same time, problem analysis times were dramatically reduced. For instance, it might have taken as long as two weeks to find the actual cause of a backup slowdown, but with VirtualWisdom, the root-cause analysis became almost instantaneous.

With a comprehensive view of actual utilization and performance metrics enabled by VirtualWisdom, the company can continue to manage its high data volume growth with an almost-flat headcount in the SAN team. Furthermore, the team is already confident at overcoming other traditional problem areas such as reducing the number of devices within the SAN, accurately forecasting capacity requirements, consolidating and decommissioning legacy servers, and qualifying known areas of opportunity in relation to storage ports.

When compared to other large customers (examples listed below), this IT organization manages its storage with on average 1/4 of the staff.

  • Large Global Bank based in the UK has almost 30 FTE managing 4PB
  • Large US based Grocer has 8 FTE managing 2PB
  • Regional Healthcare Facility – 2FTE managing 200TB
  • US Hospital – 1.5 FTE managing 300TB

If you assume a fully burdened cost of an admin to be $150K/year, then this organization is recognizing an OPEX saving of $2.7M per year.

Having an infrastructure that is up and running 24/7/365 is essential. VirtualWisdom is able to help the IT staff identify issues before they escalate into a potential application outage. This company estimates that a 24 hour outage on their critical applications could cost them in the region of $70M or $48K/minute!

CAPEX savings

Additionally, by leveraging VirtualWisdom’s unique ability to identify the I/O workload from Initiator to Target LUN, they are able to drive greater ratios of LPAR’s across their IBM P-Series cluster that they use for their critical SAP production environments.

Challenges and Concerns

Within six weeks of VirtualWisdom installation, the IT team was already identifying repetitive and potentially costly problems such as specific switch bottlenecks, replicated fabric balancing issues and virtual library testing inconsistencies. As a side benefit, VirtualWisdom also provides them with the transparency to drive resolutions with its primary storage vendors, a task that was previously complicated due to the complex architectures involved and the lack of visibility into the overall SAN.

Specific Virtual Instruments benefits to the company include:

Virtual Instruments is aligned with corporate IT strategy of standardization and simplification. The IT function is a key enabler for the transformation towards a globally aligned business through strategic alliances and partnerships with global suppliers.

  • VirtualWisdom can find a “needle in a haystack” in the SAN, with access to metrics there was simply no other way to get
  • Multi-vendor support and elimination of vendor finger pointing
  • Virtual Instruments enabled a fast installation and quickly accessible results. Storage and server virtualization was adding complexity to the SAN. Though the benefits of virtualization are clear in greater CPU, memory, and I/O utilization, server virtualization makes dynamic decisions that affect I/O and those impacts are felt most in the SAN. They needed something that could work in their environment to help them see through the “clouds” of virtualization
  • The company uses both HP Storage Essentials and IBM TPC to manage their storage assets (provisioning, capacity utilization). VirtualWisdom is very complementary to these existing vendor-oriented monitoring solutions
  • Troubleshooting – incident reduction process was dramatically improved
  • VirtualWisdom can help balance over-utilized and under-utilized ports. “Channel Utilization” of the fibre channel fabric is optimized for potential improvements in CAPEX
  • No impact on users of application; a non-intrusive solution
  • The company can now use VirtualWisdom to help evaluate and validate current and future storage technologies such as IBM’s XIV and SVC. Reduces reliance on guesswork and rules-of-thumb planning.
  • Helps to ensure that infrastructure SLAs are met by confirming that the SAN is correctly configured, for instance. Enables the SAN team to add real value to service-level discussions.
  • Helps validate tiering strategies – need a neutral third-party in order to validate tiering
  • Useful metrics for trend analysis, to help find problems and help determine future capacity needs; provides critical input for future purchasing decisions
  • Helps to improve IT infrastructure and comply with service level agreements while reducing costs
  • Helps justify network requirements, such as a DWDM upgrade
  • VirtualWisdom deployment is consistent with the Storage Maturity Model. By implementing best practices improvements in stages, the company can measure its own progress against an established benchmark of excellence.

Next Steps

The company is progressing along the Storage Maturity Model and expects to reap even greater benefits as it gains more experience with VirtualWisdom. The goal is to deepen the proactive monitoring capability, with tighter thresholds and increased dynamic alerting. Additionally, the team plans to instrument the Test Lab to create a more dynamic environment for testing and deploying solutions, and to prove out vendor performance claims.

Lessons Learned

In retrospect, the company found that it’s the fact that VirtualWisdom is a vendor agnostic solution that helps keep vendors honest. But their vendors actually recommended VirtualWisdom, so perhaps it’s good to gauge your vendors’ acceptance of a “referee” like Virtual Instruments. They found significantly over-provisioned SANs in their infrastructure where utilization rates were surprisingly low. Thanks to Virtual Instruments, they were able to indentify consolidation opportunities and expect to save millions by avoiding or delaying purchases of unnecessary SAN equipment and reduced environmental costs (power, cooling floor space) over the next few years. At the same time, the company was able to identify opportunities to distribute the load of busier data paths over more links, to more effectively distribute the load. Neither SNMP based tools, nor “rules of thumb” were able to provide effective help here.

Since the company has a redundant SAN infrastructure, it was easy to add TAPs (signal splitters) during maintenance windows with no effect on users. Having said that, it would be better to provide Virtual Instruments-approved TAPs when the initial SAN is deployed, and make it a corporate standard. The incremental cost is small, and the benefits are huge.