Storage Strategies - Thursday, August 12, 2004
Happiness, Nothingness, and Long-Term Storage
By Jon William Toigo
I ran across this message the other day when searching for an article
on the Web: "If true happiness can only be achieved through a state of
nothingness, you're going down the right path. Actually, we couldn't
find the page you requested. Please check the URL."
For a moment, I chuckled over the wit of the anonymous author. Then, I
thought about those who find themselves in need of a long-term data
retention solution. And it wasn't funny anymore.
In a previous column, I was critical of those vendors who are
leveraging fear, uncertainty, and doubt to market "regulatory
compliance" wares. Truth be told, regulatory compliance does not
require technology. By and large, it is a people-and-process issue.
Folks need to identify what data is subject to regulatory requirements
and so mark the data so that it can be included in appropriate data
What gets my goat is the fact that most of the so-called regulatory
compliance solutions are just data movers. They don't tell you what to
move or where to move it -- they just provide a way to move it. That
is about as helpful as new and improved income tax filing systems: they
may save you the trouble of licking a stamp and mailing your tax
return, but they do nothing to help you surmount the big problem of
sorting through your shoeboxes full of receipts and deciding what is
and is not deductible.
One regulatory issue that does have technological ramifications is
long-term storage. And, other than very low tech approaches such as
hardcopy or microform, the technology doesn't seem to be up to par.
Long-term data storage was once the domain of optical media. Lacking
susceptibility to the many magnetic fields that can turn hard-disk-
based data to mush or tape-recorded data to spaghetti, optical was
touted to be the archivist's dream technology. Like cockroaches,
optical disks could withstand the blast and electromagnetic pulse from
a thermonuclear device (provided that it was not directly under the
nosecone of the bomb when it detonated).
However, testing by laboratory geeks using accelerated ultraviolet
aging of optical media has demonstrated fairly conclusively that even
optical media will let us down, and it may happen sooner than later if
we don't pay extremely close attention to environmental factors such as
temperature and exposure to radiant energy.
Optical, as it turns out, has a vampiric allergy to daylight. Vendors
may promise 20 years of reliable storage, but lab tests suggest that
actual life expectancy under normal conditions may be half that for
industrial strength optics and half again for consumer grade media such
DVD, DVD-R, DVD+R, CD-R, and CD-RW.
While this may be good news to the entertainment industry, which
thrives on our willingness to repurchase our movie or album collection
every couple of years when the media gives out (remember that these
were supposed to be more durable than tape or vinyl), it may be bad
news to the regulatory compliance crowd. Rather than a sturdy Klingon-
esque "WORF" (write once read forever), optical media may well be just
as "WORN" (write once, repeat as needed) as its magnetic storage peers.
(Since I have contributed to the lexicon of storage acronyms, please
assume that there are trademarks or service marks beside these words
until I get around to filing my claim!)
What is needed is a media management methodology that copies data from
one disk to another or one tape to another once the storage medium it
about to live out its useful life. In other words, once a tape has
been read or written a certain number of times, or once a disk has
whirred on for X number of years, the data on the media needs to be
copied to new media.
This simple observation masks a passel of problems. For one, who will
remember what data needed to be moved years from now? For another, who
will ensure that the data is still useable before and after it moves
from one disk or tape to another?
The second question is the biggest onion. Peeling back its layers, you
quickly discover that the file systems used to store data may not exist
in their current form in a couple of years. For example, Network
Appliance will have to adapt its product to the Andrew File System and
replace the Berkeley Fast File System in a couple of years if it wants
to use the Spinnaker technology it acquired last year to any purpose.
Microsoft is saying that it will once again migrate customers to a new
file system whenever it gets around to releasing Longhorn, the next
file system for Windows Servers.
Peeling back another layer of the onion, it is quite likely that the
application used to create the file will not exist (or at least not in
its current form) in five years. Furthermore, downward compatibility
with earlier versions is by no means assured. One of my books from
1995 continues to be in print and ships with a diskette that contains
forms and checklists that I created using an application that simply no
longer exists. Every once in awhile I will get an e-mail from some
unfortunate who is having difficulties with the files on the
diskette -- assuming that he even still has a floppy disk drive. Take
that forward a few clicks of the calendar and you begin to wonder
whether any of the data you currently store will be readable if the
regulators ever need to sample it.
I had the pleasure last year of chatting with an archivist from the
Australian government. He told me that they were busily converting
electronic data to .PDF files, using the Adobe Acrobat tool set. The
decision was predicated on Adobe's willingness to give them source code
to its format and reader tools so that a researcher in 2525 would be
able to adapt the code to whatever computing architecture was being
used at that time and still be able to read the data. I wondered how
Moore's Law would treat such a scheme over time: would data still be
treated as bits organized into sectors and cylinders, or would it be
DNA strands in a microscopic vacuum tube. How will you revive Excel
spreadsheet data from a disk or tape when the wildly popular way to
store data is through the near-field optical addressing of luminescent
photoswitchable supramolecular systems dispersed as dopants in inert
polymer matrices (i.e., molecular storage)?
Some vendors are saying that the solution is as simple as dumping the
data into a document management system and encoding it with a message
digest header and tagging it with an IP address that can be monitored
using a proprietary controller/reader as it migrates from array to
array over time. That could work, I suppose, if you buy only one
vendor's software and controllers for the next several decades.
The best approach for now seems to be a solution based on good old
common sense: write the data to tape or disk at least twice, then
implement a media management system to tell you when to migrate the
data to fresh media over time. If the software used to write the data
undergoes a change, you will need to migrate the data into the latest
version of the software, then save it out again in its new format --
again, at least twice. This should evolve from a cottage industry in
2004 into a major assembly-line business within the decade.
I could be wrong, but chances are that data in this column will have
Copyright 2004 101communications LLC.
eroded beyond reclamation long before that happens. Your thoughts?