Storwize Family GM CV Bandwidth calculation

Bandwidth for Global Mirror with change volumes

Thanks to  Torsten Rothenwaldt for the very interesting article.
SVC/Storwize Global Mirror (in continuous mode) requires bandwidth capable of sustaining the whole write workload from hosts and the background copy process. Each host write must be transferred quickly. In contrast, the expectation for Global Mirror with change volumes (cycling mode) is low bandwidth. The copy-on-first write FlashCopy operation results in coalescing writes, and only one write must be transferred.

However, we observed that the bandwidth consumption by Global Mirror with change volumes varies between a fraction of the host write throughput and a multiple. Two examples:

  • A Jordanian customer’s hosts write 235 GB during 3 hours, but the FlashCopy targets need only 26 GB for real data. Only 11% of the data that is written by the hosts must be transferred to the DR site. The change volumes save more than 80% of bandwidth, compared to continuous transfer. (Thanks to Mamoun Lamber for data collection.)
  • A customer in Israel runs a V3700 with VMware hosts that write around 5 GB during 5 minutes. However, the router statistics show around 12 GB sent to the DR site. Global Mirror with change volumes consumes more than twice the bandwidth that would be needed for Global Mirror in continuous mode. (Thanks to Ran Rubin for the details.)

How can we explain these differences?

The Global Mirror background process for change volumes works with Global Mirror grains of 256 KB. The process transfers always the whole grain to the remote site, even if only a small portion of the grain was changed during the last cycle.

Matching this behavior, the FlashCopy mapping to the change volume uses 256 KB grains too. In the same way, a FlashCopy copy-on-first-write operation copies always the whole grain to the change volume. These copy-on-first-write operations can extent a small host write into a large write to the change volume, causing a much larger Global Mirror transfer. Additionally, a single host write can touch even two grains (because of misalignment between grains and I/Os). Furthermore, fragmentation (at the file system layer) and data placement (at the database layer) can give different results in real life than in the lab.

Such effects cause probably the high amount of data transfer for the customer in Israel.

On the opposite site, multiple host writes touch the same grain, reducing the amount of data to be transferred. Only the first write to the grain triggers a copy-on-first-write to the change volume. Subsequent writes are coalescing writes. For the customer in Jordan, that saves much bandwidth.

Except for few special workloads, it is impossible to predict how these factors influence the bandwidth consumption. Obviously, random I/O and a short cycle time reduce the savings by coalescing writes, but usual performance statistics do not give enough details for sizing. The critical unknown variable is the number of grains that are copied to the change volumes during a cycle. Knowing the exact number of changed grains per cycle, we could calculate the bandwidth that is needed to transfer this data during the next cycle.

Together with colleagues from CTS Middle East, we developed a method to measure the critical variable in the customer’s production environment. It uses the fact that change volumes are space-efficient FlashCopy targets, and normal FlashCopy behavior applies.

  1. For each production volume to be replicated, we create a space-efficient FlashCopy target volume and mapping.
  2. We start all FlashCopy mappings and wait for the time of one Global Mirror cycle.
  3. Then, we stop the mappings and record the physical space consumption of all FlashCopy targets. This space is the amount of data to be transferred as Global Mirror background copy during the next cycle. It includes all the effects that are discussed here. (The space includes also thin provisioning metadata, which can be ignored.) Furthermore, we see any performance impact of FlashCopy operations (cache flushing, extra I/Os) to normal production.
  4. Restarting the mappings resets the physical space of the space-efficient targets, and the next cycle begins.

The whole process must run many times as a script to cover workload peaks and changes in the I/O pattern.

It is important to understand that such data collection process gives useful results only for a certain cycle time. A different cycle time leads to different impacts of the effects. Therefore, to compare bandwidth and space consumption for a different RPO, the process must be repeated with this cycle time.


Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s