Can you give me some focus on H/W vs. S/W replication techniques? How can SANs
be used for replication purposes and how cost effective would it be? I worked
with an EMC SRDF/time finder but it is a costly solution. What are some major
issues with synchronous vs. asynchronous replication?
This question posed on 20 September 2001
There are quite a few replication products on the market today. The techniques
are divided into Host based replication and Hardware based replication. There
are also two or more ways to replicate data. Synchronous and asynchronous are
the main techniques used. (Semi-sync and multi-hop are others). I will focus
this reply on hardware and software sync and async.
The reasons for data replication are obvious, especially in light of this past
week?s tragedy. The idea is to separate data and computing resources in case of
hardware failure, and by distance in case of a disaster. Clustering will afford
hardware fault tolerance for the computing resources, but requires centralized
storage, and does not in all cases provide for disaster recovery. Therefore,
data redundancy is imperative.
One of the better recovery techniques is to allow for "wide-area" clustering,
such as Digital provided under the VMS operating system. Up to 32 nodes of a
single cluster could be separated over an Ethernet or "CI" (cluster
interconnect) over distance, and share all data resources among all nodes. If
one node was impacted, the rest of the nodes would automatically absorb the load
of the failed node. Data residing on disks attached to each node could be either
direct attached and shared among cluster members, or centrally shared between
You may be hearing of this today as "stretched clusters". On open
(non-proprietary) systems environments, Microsoft cluster server and Veritas
cluster server application resources can be stretched between two sites, and the
data replicated between those sites by either hardware or software based
solutions. On Solaris, you can use SNDR from Sun or VVR from Veritas to do
"host" based replication to your disaster site. Under SNDR, each write I/O is
routed to both the local disk and the remote disk in sync. Veritas VVR can also
do this, but they also offer async replication with time stamping and sequence
IDs for transactional data integrity at the remote location. These solutions
provide for seamless failover of resources to a remote site, or "hot-standby"
sites that can be brought up in minutes of a disaster.
Let me explain the different techniques:
Hardware-based synchronous: All replication processes are offloaded from the
host, and accomplished via the storage array itself. This means there is no host
CPU utilization needed. A data write is written from the host into the cache of
the storage array. The storage array then caches that data, and then retransmits
it across either a Fibre or Escon link, over distance, to the cache in the
storage array at the remote site. The remote array acknowledges the write back
to the local array, and the local array then sends the I/O complete back to the
host application. This is the best means of data replication for those
application environments where data MUST be guaranteed to be written to the
remote site. (NYSE or NASDAQ for instance.)
The problem here is that as you extend the distance, you also extend the latency
of the write data. This solution is good for shorter distances. The trade-off is
application performance vs. distance.
Hardware-based asynchronous: Again, all replication processes are offloaded from
the host, and accomplished via the storage array itself. A data write is written
into cache, and an I/O complete is immediately sent to the host application. The
storage array then retransmits the data to the remote array, and sends an ACK
back to the local array, where it is then written to disk at both locations. As
you can see, async provides for much greater distances, and has a much lower
impact on application performance. The trade-off here is that data that the
application deems written to both locations may only be written to the local
location, and lost via a link failure or a remote storage array failure.
Therefore, there is no data guarantee to the remote location.
One way to help solve this is to timestamp and sequence ID each write into
cache. On a link failure and recovery, the local array will retry the write to
the remote array, and the remote array can then use the sequence ID and
timestamp to guarantee write order at the remote site. Only a few vendors have
the capability to do hardware based async, and very few provide sequence ID and
time stamping of writes. Do your homework before making your decision.
Software-based synchronous: Each write to the local storage array from the host
is also redirected over an IP connection to a remote host, then also written to
that hosts connected storage. The benefit here is more granular control of data
replication, but the distance limitation still applies. The trade-off is CPU
utilization on the production host, and each host must replicate its own data.
This means if you have 100 hosts on the production side, all 100 would need to
replicate it's Luns to the disaster site, opposed to one storage array
replicating to one storage array at the disaster site.
Software-based asynchronous: Same as hardware based above, but driven by each
host. Also, very few software vendors provide for Timestamps and ID's of data to
guarantee transactional consistency at the remote site in case of link failure.
Again, do your homework.
As you can see, this is an advanced topic that needs to be looked at carefully.
Defining the type of link and link bandwidth is a subject all in itself. There
are new techniques coming to the fore shortly that will expand this subject
dramatically. iSCSI and iFCP protocols, along with InfiniBand and VIA
environments will make your choices even more confusing. You can outsource all
this to a storage utility vendor or if you have in-house expertise, you can do
an RFP for your requirements. If not, I would suggest using a storage
consultant/architect who can help you in your decisions.
All Rights Reserved, Copyright 2000 - 2002, TechTarget