Hal Miller, the immediate past President of SAGE, gave a talk on what it's like to approach a petabyte of storage.
A petabyte is 1024 terabytes, or approximately 1.1x1015 bytes. (For the curious, the next orders of magnitude are exabyte (EB) and zetabyte (ZB).) The trends are towards explosive growth but with bandwidth bottlenecks. The desire seems to be the equivalent of "dial tone" for IP networking, computing, and storage. This is all well and good, but how do we get there and make it work?
The problems with a petabyte are many. Hal touched on some of them: 1 PB is approximately 100,000 spindles on 18 GB disks. Mirrored 5-way that's 500,000 spindles (and 2 copies offsite). Mirrors take 70,000 spindles, plus RAID drives, spares, and boot blocks, so we're talking around 1,000,000 total spindles. At $1,000 per that's $1 billion. Just for the disk — this excludes the costs of servers, towers, networking, and so on. Where do you put these disks? What are the power and cooling requirements for them? How do you perform the backups? How accessible are the backups? Where do you store the backups? How can you afford the storage, the facilities, the power, the cooling, the maintenance, the replacement of disk? As you can see, there are many questions but few answers.
Why is this relevant? Who faces this problem? Oil companies (geophysical research), medical research (including genetic research), and movie companies (special effects) face it now. Atmospheric sciences, oceanographic sciences, manufacturing, and audio delivery will face it soon. And academic institutions will face it as well, since they do research as much as (if not more than) commercial institutions.
More information (and the slides from the talk) are available at http://chroma.mbt.washington.edu/hal/LISA/.