IBM Smart Storage Cloud

Enjoy meeting Tony in this video, you can find more information about him and how he works with storage clouds in his blog – Inside System Storage @ http://ibm.co/brAeZ0

Or download the related RedPaper from here:

http://www.redbooks.ibm.com/redpieces/abstracts/redp4873.html?Open

Advertisements

Are hard drives getting too big?

Listen this funny and interesting Pod Cast about Hard Drive Size.

In this Pod Cast, Rick Vanover welcomes Greg Schulz. Greg is a blogger, analyst and virtualization/storage expert based in Minnesota. Greg also blogs at Storageioblog.com, which contains a wealth of virtualization and storage resources. Rick and Greg talk about storage technologies of the day in this episode.

http://veeam.podbean.com/2011/07/11/episode-27-are-hard-drives-getting-too-big/

Using Brocade SAN Health in logical multiswitch environment

During the delivery of a IBM SVC Split cluster with ISL solution, unfortunately for me, I came across a limitation of the SAN Health, who has “difficulty” to discover the topology of the SAN configured through the use of logical switch within physical switch. The design that I wanted to get automatically through the use of tools Brocade SAN Health is the following:

I supposed to use SAN Health as used in the past, just configuring the FC switches inside the tool using their Management IP address.  The problem is that multiple logical switch are referring to same management IP address, in addition is those are interconnected via ISL to other logical switch, it cause confusion in SAN Health discovery.

Asking to my Brocade friend, I got the following circumvention.

On physical switches with ip add: 10.200.2.122 and ip add: 10.200.3.122 with fid: 22 and 26, that are the two UP in the above picture I have configured the new following userid

userconfig –add baseadmin22 -r admin -l 22 -h 22 -c admin
userconfig –add baseadmin26 -r admin -l 26 -h 26 -c admin
userconfig –add baseadmin23 -r admin -l 23 -h 23 -c admin
userconfig –add baseadmin27 -r admin -l 27 -h 27 -c admin

On physical switches with ip add: 10.200.2.124 and ip add: 10.200.3.123 with fid: 28 and 24, that are the two DOWN in the above picture I have configured the new following userid

userconfig –add baseadmin28 -r admin -l 28 -h 28 -c admin
userconfig –add baseadmin24 -r admin -l 24 -h 24 -c admin
userconfig –add baseadmin29 -r admin -l 29 -h 29 -c admin
userconfig –add baseadmin25 -r admin -l 25 -h 25 -c admin

The idea was to be able to access to each logical switch in the fabric with a different userid to be able to discover its counterpart. In that way I had to run the SAN Health twice in order to get two different report, one for my SAN named Priv and one for my SAN named Pub.

At the moment I asked to Brocade if this is a real SAN Health limitation and when it will be fixed….or just my wrong usage !! I will let you know !

The risk of a fake SFP

Thanks Sebastian and thanks Roger

Storage CH Blog

Very good post from Sebastian Thaele (thank you)

It’s the nightmare of every motorist. Your car was just repaired a few days ago and now it stopped running in the middle of nowhere. Or you even crashed, because the brakes just didn’t work in the rain. Fake parts are a big problem in the automotive industry. Original-looking parts from dubious sources could even work as expected in normal operations but when the going gets tough, the weak won’t get going. So before a fake cambelt wrecked your engine or a fake brake pad costs your life, it might be a good idea to not save on the wrong things.

Read on here

 

View original post

IBM Smarter Storage with Nick Clayton

Thanks Roger and thanks Nick…..

Storage CH Blog

Storage Architect, Nick Clayton talks about Smarter Storage which is part of IBMs Smarter Computing Strategy. There are a number of threads which make up the Smarter Storage concept. We say that storage should be Efficient by Design, providing optimal performance and use of resources, Self Optimising, reducing management overhead, and Cloud Agile, accelerating deployment and scaling to meet changing requirements. Here Nick will concentrate on the subject of self optimising storage.

View original post

The zero buffer-to-buffer credits nightmare

Recently I came across a performance problem most likely caused by a congestion of the SAN and especially the lack of buffer-to-buffer credits. So I dusted off the old theories that date back to the origins of SAN, but that in certain critical situations should not be completely neglected.

Basics of Fibre Channel Flow Control and Buffer Credit Theory
The basic information carrier in Fibre Channel is the frame, and all information is contained within frames like when you send a letter encapsulated within an envelope. When you want to send data thru a SAN, your data is encapsulated within a frame.

Buffer credits, also called buffer-to-buffer credits are used as a flow control method by Fibre Channel technology and represent the number of frames a port can store.
Each time a port transmits a frame that port’s BB Credit is decremented by one; for each R RDY received, that port’s BB Credit is incremented by one. If the BB Credit is zero the corresponding node cannot transmit until an R_RDY is received back.
Each of these credits represents the ability of the device to accept additional frame(s). If a recipient issues no credits to the sender, no frames can be sent. Pacing the transport of subsequent frames on the basis of this credit system helps prevent the loss of frames and reduces the frequency of entire fibre channel sequences needing to be retransmitted across the link.
This mechanism to prevent a target device (either host or storage) from being overwhelmed with frames is known also as BB_credit flow control.
If a FC frame arrives while the receiver is already processing one first frame, a second receive buffer is needed to hold this new frame. Unless the receiver is capable of processing frames as fast as the transmitter is capable of sending them, if the receiver will not have a receive buffer available the next frame will be lost. To prevent this type of condition, the Fibre Channel architecture provides a two level flow control mechanism that allows the receiver to control when the transmitter may send frames.
The receiving port controls the frame transmission by giving the sending port permission to send one or more frames to the receiving port in question. This permission is called a credit. The actual credit(s) are granted during the login process between two ports. The credit value is decremented when a frame is sent and replenished when a response is received. If the available credits for a given port reaches zero, the supply of credits is said to be exhausted. Further transmission of frames with that port is then suspended until the amount of credits can be replenished to a non-zero value.

There are two types of flow control mechanisms in FC:
•    End-to-End Flow Control.
•    Buffer-to-Buffer Flow Control.

End-to-End Flow Control.
Transmission credit is initially established when two communicating nodes log in and exchange their respective communication parameters. The nodes monitor end-to-end flow control themselves. Switches or directors do not participate in EE Credit and End-to-End Flow Control is always managed between a specific pair of node ports. Therefore, an individual node port may have many different end-to-end credit values, each corresponding to a different destination node port.

Buffer to Buffer Flow Control
Buffer-to-Buffer Flow Control is flow control between adjacent ports in the I/O path.
In this case a separate, independent pool of credits is used to manage Buffer-to-Buffer Flow Control where a sending port using its available credit supply is waiting to have the credits replenished by the port on the opposite end of the link.
An end node attached to a FC director or switch will establish its BB Credit during login to the fabric. A communicating partner attached elsewhere on the same director or switch will establish its own and most likely different BB Credit value to the director during its login process.

This system may affect overall performance and efficiency. Considering that light takes approximately 5usec to travel 1km and a typical FC frame at 2Gbits is 1 Km long, this behavior becomes even less efficient and more of a performance drag on longer distance  or when traveling through complex topologies that contribute significant delivery latencies.
So, to achieve the higher performance we need to use BB Credit values>1 and if a sending port is allowed to send more than one frame without having to wait for a response to each, performance can be improved until link utilization reaches 100%. When the link is thus fully utilized, frames can be sent as rapidly as allowed but additional credits will not help matters.

In now days Fibre Channel fabrics have evolved and are containing thousands FC ports,  hundreds of hosts and several storage ports are now common. I/O levels and consequently traffic volumes, particularly across fabric, are much higher then in the past. Workload has also become much more complex and this makes much more difficult to isolate application problems when application performance becomes a problem.
Storage virtualization has created its own special I/O requirements, adding a degree of complexity to the I/O complex .
Rogue or badly behaving devices have much more impact on production environment than previously.
When application performance problems become obvious the storage complex is frequently blamed. The impact of such outages ranges from an inconvenience to a massive outage where mission critical application availability is compromised and the enterprise is seriously affected.

Diagnose a exhausted  buffer-to-buffer credits problem is a long, complex activity and requires appropriate tools. These tools must be capable of measuring the lack of buffer credits from the device or host point of view, connected to the SAN and must also be capable of measuring the lack of buffer credits from the SAN director or switch point of view.
A great help come from Storage performance analysis tool or from SAN brand tool like Brocade Bottleneck Detection or CISCO proprietary tools.
In my case, because the SAN was based on Brocade products I found very helpful the following guide:

Bottleneck Detection Best Practices Guide GA-BP-383-00 http://tinyurl.com/8aolmo8

and

Fabric Resiliency Best Practices IBM redpaper http://tinyurl.com/8sphtvb

and
Thanks to Sebastian for his blog http://tinyurl.com/8uwc7df