Pareto priciple applyed to Storage Subsystem

In recent months I have had the opportunity to analyze the performance of some storage subsystems to some customers.
I found, at least for me it was a novelty, given that 100% of the disk space provided by a storage subsystems, only a small fraction of this is used and accessed by a large number of I / Os, typically not less than 80% of the total.

In the following image you can see an example of this behavior.
The blue line, namely the line that unites all the points obtained on a scattered plot, represents the total number of I / Os of a storage subsystem analyzed for a given period of time.

Screenshot from 2013-08-17 09:58:52As you can see, only in 7.5% of the disk space is accessed by 95% of the I / Os. This can be a special case but typically the relationship is given by a 20% of disk space accessed by 80% of I / Os.

The pink line, or the line joining all the points represented on a plot scatterd, instead shows the behavior of those who are considered Hot I / Os relative to the total disk space provided.

As you can see, the Hot 80% of I / Os engages in only 22.5% of the total disk space.

This behavior reminded me of the Pareto Principle:

The Pareto principle (also known as the 80–20 rule, the law of the vital few, and the principle of factor sparsity) states that, for many events, roughly 80% of the effects come from 20% of the causes.

That applyed to computer science, i.e.

In computer science and engineering control theory such as for electromechanical energy converters, the Pareto principle can be applied to optimization efforts.[10] For example, Microsoft noted that by fixing the top 20% most reported bugs, 80% of the errors and crashes would be eliminated.

So, the question is:

Can we apply the Pareto Principle configuring a Storage Subsytem ?

My personally answer is YES !

Given a known profile of I / Os, applying the Pareto Principle we could configure a disk subsystem with a lower proportion of high-performance disks and a high percentage of low-performance disks.
In this way you will optimize costs of the subsystem under all points of view:

$ $ * TB
$ $ * Footprint
$ $ * Cooling and Power

In addition if the disk space allocation related to the performance demands were automated or left to decide for an automatic tiering, all would also be much easier to manage.

For now I think I’ve given enough elements to think about and I’d like to know if you have had similar experiences or if you think differently.


Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s