Disk System Fault Tolerance Networking

A hard disk is a temporary storage device, and every hard disk will eventually fail. The most common problem is a complete hard-disk failure (also known as a hard-disk crash). When this happens, all stored data is irretrievable. Therefore, if you want your data to be accessible 90 to 100 percent of the time (as with warm and hot sites), you need to use some method of disk fault tolerance. Typically, disk fault tolerance is achieved through disk management technologies such as mirroring, striping, and duplexing drives and provides some level of data protection. As with other methods of fault tolerance, disk fault tolerance means that a disk system is able to recover from an error condition of some kind.
The following methods provide fault tolerance for hard-disk systems:

  • Mirroring
  • Duplexing
  • Data striping
  • Redundant Array of Independent (or Inexpensive) Disks (RAID)

Understanding Disk Volumes

Before you read about the various methods of providing fault tolerance for disk systems, you should know about one important concept: volumes. When you install a new hard disk into a computer and prepare it for use, the NOS sets up the disk so that you can store data on it in a process known as formatting. Once this has been achieved, the NOS can access the disk. Before it can store data on the disk, it must set up what is known as a volume. A volume, for all practical purposes, is a named chunk of disk space. This chunk can exist on part of a disk, can exist on all of a disk, or can span multiple disks. Volumes provide a way of organizing disk storage, as you can see in this illustration:

Understanding Disk Volumes

1. Disk Mirroring

Mirroring a drive means designating a hard-disk drive in the computer as a mirror or duplicate to another, specified drive. The two drives are attached to a single disk controller. This disk fault tolerance feature is provided by most network operating systems. When the NOS writes data to the specified drive, the same data is also written to the drive designated as the mirror. If the first drive fails, the mirror drive is already online, and because it has a duplicate of the information contained on the specified drive, the users won’t know that a disk drive in the server has failed.

The NOS notifies the administrator that the failure has occurred. The downside is that if the disk controller fails, neither drive is available. Figure 9.1 shows how disk mirroring works. The drives do not need to be identical, but it helps. Both drives must have the same amount of free space to allow a mirror to be formed. For example, you have two 4GB drives; one has 3GB free, and the other has 2GB free. You can create one 2GB mirrored system.

Disk Mirroring

2. Disk Duplexing

As with mirroring, duplexing also saves data to a mirror drive. In fact, the only major difference between duplexing and mirroring is that duplexing uses two separate disk controllers (one for each disk). Thus, duplexing provides not only a redundant disk, but a redundant controller and data ribbon as well. Duplexing provides fault tolerance even if one of the controllers fails. Notice that there is now an extra disk controller in the system.

Disk Duplexing

3. Disk Striping

From a performance point of view, writing data to a single drive is slow. When three drives are configured as a single volume, information must fill the first drive before it can go to the second and fill the second before filling the third. If you configure that volume to use disk striping, you will see a definite performance gain. Disk striping breaks up the data to be saved to disk into small portions and sequentially writes the portions to all disks simultaneously in small areas called stripes. These stripes maximize performance because all of the read/write heads are working constantly. Notice that the data is broken into sections and that each section is sequentially written to a separate disk. Striping data across multiple disks improves only performance; it does not improve fault tolerance. To add fault tolerance to disk striping, it is necessary to use parity. Disk striping is also known as RAID level 0.

How disk striping works

How disk striping works

4. Redundant Array of Inexpensive (or Independent) Disks (RAID)

RAID is a technology that uses an array of less-expensive hard disks instead of one enormous hard disk and provides several methods for writing to those disks to ensure redundancy. (The term independent found favor as the cost of larger disks became less prohibitive and skewed with regard to how much larger they were than the drives of a more common or average size for the day).

Those methods are described as levels, and each level is designed for a specific purpose:

RAID 0 (Commonly Used) This method is the fastest because all read/write heads are constantly being used without the burden of parity or duplicate data being written. A system using this method has multiple disks, and the information to be stored is striped across the disks in blocks without parity. This RAID level only improves performance; it does not provide fault tolerance.

RAID 1 (Commonly Used) This level uses two hard disks, one mirrored to the other (commonly known as mirroring; duplexing is also an implementation of RAID 1). This is the most basic level of disk fault tolerance. If the first hard disk fails, the second automatically takes over. No parity or error-checking information is stored. Rather, the drives have duplicate information.If both drives fail, a new drive must be installed and configured and the data must be restored from a backup. RAID 1 has the least processing overhead of the more popular RAID levels that provide fault tolerance compared with RAID 3 and RAID 5, for example).

RAID 2 At this level—which is no longer recommended for reasons stated later—individual bits are striped across multiple disks. Multiple redundancy drives in this configuration are dedicated to storing error correcting code (ECC), a method of error correction found built in to modern hard drives, without the use of RAID. If any data drive fails, the data on that drive can be rebuilt from ECC data stored on the redundancy drive. Two of the better known configurations included an array of 10 data disks and 4 ECC disks and an array of 32 data disks and 7 ECC disks. Due to requirements that are incredibly difficult and expensive to implement, such as specialized controller hardware to synchronize the spindles of all disks in the array, this is not a commonly used implementation.

RAID 3 At this level, data is striped across multiple hard drives using a parity drive (similar to RAID 2). The main difference is that the data is striped in bytes, not bits as in RAID 2. This configuration is popular because more data is written and read in one operation, increasing overall disk performance.

RAID 4 This level is similar to RAID 2 and 3 (striping with parity drive), except that data is striped in blocks, which facilitates fast reads from one drive. RAID 4 is the same as RAID 0, with the addition of a parity drive. This is not a popular implementation.

RAID 5 (Commonly Used) At this level, the data and parity are striped across three or more drives. This allows for fast writes and reads. The parity information is written with the data across all drives in the array as opposed to the dedicated parity drive of RAID 4. So, if any one disk fails, the drive can be replaced and its data can be rebuilt from the data and parity data stored on the other drives. This works well if one disk fails. If more than one disk fails, however, the data will need to be recovered from backup media. While a minimum of three disks is required, five or more disks are most often used.

RAID 6 RAID 6 is similar to RAID 5. It is less popular, however, due to the need for specialized, usually more expensive controllers and the loss of an additional drive for its cause. RAID 6 uses RAID 5 as a basis but duplicates the parity information, saving the second copy on a different drive from the one on which the first copy was saved. This implementation requires an additional drive over RAID 5 but can handle the simultaneous failure of two drives.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Networking Topics