Eric A. Hall

NOTE: This article is an archived copy for portfolio purposes only, and may refer to obsolete products or technologies. Old articles are not maintained for continued relevance and accuracy.

October 15, 1996

Managing Mass-Storage Monsters

If there's one maxim that holds true, it's "data expands to fill the space available." It seems that no matter how much hard-disk space you have, people will find lots of creative ways to fill it. This unfortunate truism leads to an eternal search for more and better ways of maximizing storage options.

This search doesn't have to lead to your continually buying more and larger disks, but instead can be accepted as the need for better storage management strategies and procedures. Instead of trying to fight fire with fire, focus on building a comprehensive, multilevel strategy that will provide for growth. Finding a way to manage the problem will yield much more satisfaction than constantly trying to fight the symptoms.

Of course, this often is easier said than done. Although it's easy to pay lip service to the desire for better mass-storage facilities, it's often difficult to take the time and energy required to architect a flexible solution to the problems. Even if you are able to piece together an effective strategy, getting management to pay for something so esoteric can be difficult. Additionally, there's the implementation, followed by the routine maintenance and management, which can be boring as hell, to say the least.

We can't buy these products for you nor can we help you personally with the day-to-day management tasks, but we can help you design your strategic solution. In addition, we will offer various tips we've collected over the years.

Hard Drives Are the Root of All Evil

Let's face facts: The more drives you have, the harder your life is. You have to back them up. You increase your exposure to the negative effects of downtime that eventual failure is sure to bring. You also can spend lots of money, even if drives are cheap on a one-off basis.

Instead of buying more drives, perhaps you should be buying fewer of them. We don't mean you should consolidate many small devices into a few large ones (though this is often a good idea), but you should find a way to minimize the amount of front-line magnetic storage available on your network. If data expands to fill the space available, then the necessary correlation is that you can minimize the amount of "necessary" data if you also reduce the amount of available storage.

This just isn't true, of course, but it does frame our most fundamental position, which is that data needs to be prioritized before it can be managed effectively. The best way to prioritize is to eliminate. Do you really need a hard drive on each PC, or could you get by with a better-managed server-based storage plan? If you manage applications and user data more efficiently, you will reap more rewards in stability, which means fewer disasters.

There are opposing arguments that say it is better to spread your risk by putting local drives in every system, but this doesn't hold up in the long term. Although it's true that if the server crashes all PCs attached to it also are knocked out, you can minimize these risks if the system is architected correctly. Having drives on each system means (eventually) you will have failures on each system that will cost you much more in labor than a single widespread failure would. In addition, it's easier to prevent a single failure than it is to prevent hundreds of them.

When you design a centralized storage mechanism, it's important to recognize that there are several types of disk I/O generated. At one extreme, there are small files that are loaded frequently, calling for fast random access. At the other end, there are large files that are loaded infrequently, but demand large throughput when they are loaded.

For the small files, what you want more than anything is fast seek times. The disk will need to shoot from one corner to the other quickly, since several simultaneous requests for small pieces of data are likely to occur. Throughput is irrelevant, since a large pipe won't be filled by these short bursts. These types of files generally are word-processing documents, spreadsheets and many of the most common applications. Since these files are accessed often, you can store many of them together on the same disk without having much of an impact on performance—assuming you purchased very fast drives. By using disks that support fast seek times, the many requests will be satisfied quickly, allowing for good overall performance.

For large databases and sequential files, however, the opposite is true. These files tend to be large blocks of data that are not opened and closed quickly, but instead are loaded once or twice a day and then searched heavily. If someone needs a report or a query executed, you need maximum throughput, as since a quick return of all the data will make the entire operation faster. You don't care about seek time because multiple random requests aren't as likely as raw reads of huge chunks of data.

Which method is suitable for you? Both are, undoubtedly. Your best bet is to set up two separate disk systems, each optimized for its specific purpose.

Fault Tolerance

Once you've defined your distribution of media, you'll want to ensure it doesn't go down or, if it does crash, you'll want to minimize the impact on end users. Your best bet is to create your disk farms using Redundant Arrays of Inexpensive Disks (RAID) Level 5 arrays, and then mirror them using RAID 1. This becomes an indestructible setup (as long as you have mirrored servers and power supplies as well).

This sort of fault tolerance used to be expensive, but with today's disk prices it's no longer a forbidding proposition. In fact, many of today's offerings provide this level of functionality and more in standard configurations. Many even include options such as dynamic volume resizing, firmware-based management programs and high-performance dedicated processors. Some of these systems almost make a dedicated file server obsolete.

Beyond the simple RAID management tools, you should also look for monitoring and alert capabilities, which allow you to fix any problems that might occur. Among these alert capabilities, the ability to send Simple Network Management Protocol (SNMP) traps allows the RAID device to send alerts to a central SNMP console. Even better, look for a RAID subsystem that will take advantage of server-based software to send e-mail or pager alerts whenever alarms are sounded.

Alternative Media

If your LAN is like most, hard-disk storage isn't the only mass-storage media that needs managing. Including alternative media devices is an important part of proper planning and management.

One of the most common devices found on a LAN is a CD-ROM drive. With the development of operating systems that consume hundreds of megabytes of disk space, CD-ROM-based distribution is an inevitable event, even on the smallest of LANs. But these little devices can cause management nightmares. As the number of CDs proliferates, you are faced with two options, neither of which are pleasant: You can constantly swap CDs in and out of drives as users request/demand, or you can put more drives and controllers into your already overcrowded servers.

There are other options. One is to buy a few multigigabyte hard disks and copy the contents of the CDs onto the drives. Two benefits that come from this solution are less manual effort and vastly improved response times. Another option is to use robotic CD disc changers. Some of these devices take discs in slots built into the changer, with a separate arm that picks the appropriate disc. Other systems use moving multislot cartridges and rotating disc holders like those found in musical jukeboxes. For most robotic systems, the average time to find and position a disc for writing or reading is six seconds.

Backing It Up

Although the goal behind developing a strategic storage management plan is to simplify your life, having many different types of storage on hand makes for a fairly complex environment. However, this is a far easier-to-manage environment than one with many more different disk systems, like most desktop-centric environments.

For example, backing up three or four fully mirrored RAID 5 arrays is easier than backing up 50 different workstations. For one thing, you're likely to have faster backup I/O channels on a server than you will on a PC workstation via shared Ethernet, so it will probably take less time to back up a gigabyte locally than 100 MB remotely. Also, you can increase the aggregate throughput of your backups by adding multiple tape drives, allowing drives to run simultaneously.

Some offerings support up to eight drives running concurrently, bringing the aggregate throughput up to 15 GB per hour, assuming you connect the drives directly to the server and run the software on the server. These monsters usually combine parallel backups with streaming, increasing throughput by continuously spinning the tapes. If you need to back up multiple servers, you might consider a dedicated backup server with an automatic tape changer.

Part of your overall backup solution will depend on the tape format you choose to use. Quarter-inch cartridge (QIC) used to be the standard choice just a few years ago, but over the past few years newer technologies like 4-mm digital audio tape (DAT) and 8-mm helical scan drives have become more popular. Newer entries like digital linear tape (DLT) offer even greater capacities and speeds, and even QIC is making a comeback with new, more flexible formats. Each of these different media offers advantages, but only one should be used across your organization. Consistency of media will make your life much easier, especially in times of crisis.

Other features to look for include flexible scheduling, alert notification tools, 24-hour support and frequent updates and patch fixes. If your backup vendor is falling down in any of these areas, re-evaluate your choice before something unrecoverable happens.

Hierarchical Storage Management

Hard disks, CD-ROMs, WORM and tape drives are all fairly static forms of storage—data doesn't get moved from one to the other unless a human operator specifically instructs so. However, with the recent rise in popularity of a variety of hierarchical storage management (HSM) offerings, these migrations can occur dynamically, using just a bit of human prompting. HSM products allow you to migrate infrequently used files to cheaper, larger and slower mass-storage devices, leaving small "stub" files in their place. The data is moved back to primary storage only when the file is requested, eliminating expensive front-line storage without actually getting rid of the data.

The way the migration occurs is determined by the administrator and can generally be set according to length of time, type of file, amount of available space and other criteria. As usual, there is a wide range of products and implementations to choose from, but the criteria is pretty simple. The product needs to be completely customizable in terms of origin and destination: You may want to move data from a disk on server "A" to a magneto-optical disk on server "B." It also needs to support complete, invisible file recall, regardless of the client OS. It also helps if the product is integrated with your existing backup solution, or offers a suitable companion product so that the two are aware of each other's functions. Finally, it needs to support all of the standard alert mechanisms, so any alarm conditions that occur can be relayed to staff members for immediate attention.

Working up an effective storage management strategy can be an intense effort, involving a variety of employees and vendors. But the one-time costs incurred from the effort and equipment purchase will likely be offset by reduced management expenses incurred in running a poorly designed storage management environment. Also, it can be easy to garner management support for these efforts, once you've outlined the reduced level of exposure to your organization, not to mention the benefits of the reduced operating expenses.