The problems backing up big databases
Jon William Toigo
10 Feb 2004
Rating: -3.75- (out of 5)
According to the University of California at Berkeley, the fastest growing
subset of business data is not files, but block data contained in relational
database management systems. Anyone who has ever worked with backup/restore
knows the hassles of backing up databases to and restoring them from tape --
especially big databases. So, Berkeley's insight is not exactly cause for
celebration. But, there are still some issues with database backup that need
Here are some of the larger issues: Do bigger (say, multi-terabyte) databases
spell death for tape, which chugs along at only 2 TB per hour under ideal
laboratory conditions? Do such grand data constructs force companies into a
disk-to-disk or mirroring strategy, and perhaps into a SAN topology, as some
vendors would suggest? Does a big database shatter the concept of "backup
windows" once and for all, since you need to quiesce a database before you copy
its data to tape or disk, and copying all of the data in a huge database
necessitates a fairly lengthy quiescence period, perhaps a lengthier one than
your business can tolerate?
These are all good questions that are finally getting some attention as storage
vendors jockey for position in the burgeoning "Information Lifecycle Management"
space. In December, then again in late January, EMC Corporation made some
much-covered moves to ally with in Campbell, CA-based OuterBay Technologies
Oracle itself, to obtain tools and skills for sorting down the contents of
huge databases –- ostensibly, to migrate older, non-changing, data in the DB to
second tier disk platforms.
REFERENCE DATA IN DATABASES
These new friendships make sense, of course, within the context of EMC's
"reference data" philosophy. Says EMC, the world is full of often accessed, but
rarely modified data that needs to stay online for reference purposes. But it is
not cost-effective to host such data on your most expensive, most high
performance gear. Seems like a sound observation.
EMC is seeking to apply this philosophy to big databases and to develop an
enabling strategy that disaster recovery and business continuity planners have
been seeking for years. The strategy is simple: confronted by a really big
database, might it not be possible to "pre-stage" the lion's share of the DB
(the non-changing part) at the recovery center. There, in the event of an
interruption, the "pre-staged" data could be loaded from tape to disk in the
time it took for the IT guys to travel to the emergency recovery center or hot
site. With a viable data segregation and pre-staging methodology, recovery
personnel could carry only backups of the changing data components of the DB to
the hot site, then load them on top of the already restored non-changing or
reference DB components. In short order, you would be ready for processing.
The scenario has appeal for the preponderance of firms that already have
investments in tape technology and for whom the cost of mirroring is too great
to justify. Plus, to the delight of StorageTek, Quantum, ADIC, Overland, Sony,
Breece Hill, Spectra Logic, and many others, it has the additional value of
keeping tape library vendors in profit.
The question is whether the enabling technology that EMC and others are
exploring to carve "reference data" out of databases is feasible given the
diversity and uniqueness of databases in play today. The answer is maybe.
Don't fight your DBA
Jon William Toigo
17 Feb 2004
Rating: -4.67- (out of 5)
In part one of this tip, Jon William Toigo discussed some issues associated
with backing up large-scale databases, and offered insight into what one company
planned to do about it through the use of reference data segregation and a
pre-staging methodology. Part two gets to the root of the problems associated
with large-scale backups.
The root of the problem
Database administrators and designers have had the ability for many years to
construct their databases so that "reference data" could be neatly tucked away
into well-defined subset constructs. Comparatively few have built this
functionality into their DB architecture however. Why? The explanation is the
same as the explanation for why so many n-tier client-sever applications lack
common middleware standards, a design factor that inhibits their recoverability:
No one asked them to.
Generally speaking, DBAs have a bad rap. They often take it on the chin from
storage guys who view them as out-and-out resource hogs. Storage administrators
frequently complain that the DBA doesn't understand storage resource management.
He mismanages the resources he has and often requests much more capacity than he
actually needs, compromising capacity allocation efficiency strategies. At the
end of the day, most storage guys throw up their hands in disgust and just give
the DBA whatever he wants, especially if his application is mission critical.
Disaster recovery planners have adopted an even more laissez faire approach
by simply accepting whatever instructions the DBA gives them regarding the
capacity and platform requirements for database recovery. DBAs almost always
mirroring or low delta journaling systems to safeguard their assets. From
their perspective, it is the simplest way to cover their data stores, regardless
of whether it is also the most expensive and inflexible approach.
What has always been missing is a collaborative strategy that would give storage
managers and DR planners chairs at the application and database development
tables. Without their input at the earliest design phases and throughout the
design review process, the
management and recovery criteria for data base and application design
typically go unstated and are not provided in the resulting product.
Of course, the idea of introducing personnel from storage and DRP into the
database design process will likely raise the hairs on the necks of DBAs
everywhere. Database and application designers have their own lingo and
diagrammatic conventions, most of which seem alien to non-DBAs. Anyone who
doesn't talk the talk, can't communicate effectively with the DBA let alone
specify requirements in terms and language that the DBA will understand.
Some retraining might help to bridge the gaps. But, to really address the
systemic problems, a complete retooling of IT professional disciplines is in
order: combine the data protection skills and knowledge of the DRP guy with the
storage administration skills and knowledge of the storage guy with the database
design and administration skills and knowledge of a database guy and you will
produce the "data management professional." But that would require chimeric gene
splicing in the extreme and would probably violate the Harvard protocols on
In the absence of such sweeping systemic and procedural changes, solving the
problems of large-scale database backup will require a conscientious effort to
get the DBAs and data protection folk talking to one another so they can come up
with recoverable designs. In the final analysis, this is probably a more
fruitful approach than trying to find a silver bullet technology for ferreting
out all the cells from all the columns and all the rows that seem to have the
characteristics of reference data.
All Rights Reserved, Copyright 2000 - 2004, TechTarget