How does compression and deduplication affect flash storage?

As a seasoned storage expert who’s spent plenty of time in the field, I wanted to share some insight gleaned from our customers on how flash storage performance is affected from compression and deduplication.


Here are two emails that hit my inbox today:

Email from customer 1:
“We need to be able to control the amount of duplicate data generated by the test system for it to make sense.  And because it’s an all Flash / SSD array, you’d have to be able to generate literally millions of IOPS.”

Email from customer 2:
“We would like to test certain ‘smart’ storage arrays that dedupe/compress (i.e. don’t store zeroes or repeated patterns).  And we can’t test those with the usual freeware tools like IOmeter.”

First off, we live for these emails!  They make us giddy at Load DynamiX. We’ve worked our butts off developing technology that addresses these exact questions! Every day, I have conversations with storage architects and engineers wanting to use the latest Flash / SSD technologies from Pure Storage, Skyera, XtremIO, Tegile, Tintri, Nimbus Data, Nimble, Violin, Virident, SolidFire, Kaminario, Fusion-IO, etc., but are hesitant as Flash/SSD presents special testing issues.

For starters, Flash vendors claim they are extremely fast. Just look at their marketing claims:  “1 MILLIION IOPS!!!” Then add the built-in intelligence for dedupe and compression and they are downright impossible to test with generic simulators. Like our customers said – it requires a high level of control over the data content and a whole lot of performance driving that data – neither of which you can do with freeware.

I realize some folks may read this previous paragraph and think, “Why bother testing? With that performance and the capacity savings – let’s just plug it in”. Actually, that was an email from customer 3, but we like to focus on the positives here. I can’t blame them though because, on paper, every single one of these intelligent arrays sound extremely promising – there is definitely an allure to the idea that they can magically make IOPS skyrocket while cutting capacity requirements in half.

If only it were that easy.

There is plenty of information available on the subject, but the reality is, it all depends on the workloads. As we like to say around here, “Your mileage may vary”. Of course, it also depends on how much you are willing to spend.

As always, the only way to truly find out the benefit (or risk) of a proposed infrastructure change, especially with brand new Flash array solutions, is to test them with workloads representative of your environment and scale. And to do that, you have to control the pattern repetition, duplicability and compressibility of the data content AND do this with “literally millions of IOPS”.

To sum it up, you need the ability to:

  • Emulate your workload
  • Control the duplicability of the data content
  • Control the compressibility of the data content
  • Generate millions of IOPS

Check. Check. Check. And Check!

Contact me for a demo today!

Kalen Kimm
Business Development