Before implementing fault tolerance or disaster recovery, you should determine how critical your systems are to daily business operations. Additionally, you should determine how long each system could afford to be nonfunctional (down). Making these determinations will dictate which fault tolerance and disaster recovery methods you implement and to what extent. The more vital the system, the greater lengths (and, thus, the greater expense) you should go to in order to protect it from downtime. Less-critical systems may call for simpler measures. For example, banks, insurance companies, the U.S. government, and airlines all run highly critical computer and network systems. Thus, they all have complex and expensive fault tolerance and disaster recovery systems in place.
In terms of how fault tolerance and disaster recovery are implemented, sites can be described as hot, warm, or cold. As the temperature decreases, so does the level of fault tolerance and disaster recovery that are implemented at a site.
In a hot site, every computer system and piece of information has a redundant copy (possibly multiple redundancies). This level of fault tolerance is used when systems must be up 100 percent of the time. Hot sites are strictly fault-tolerant implementations, not disaster recovery implementations (as no downtime is allowed). Budgets for this type of fault-tolerant implementation are typically large.
In a system that has 100-percent redundancy, the redundant system(s) will take over for the failed system without any downtime. The technology used to implement hot sites is clustering, which is the process of grouping multiple computers in order to provide increased performance and fault tolerance.
Although servers are commonly clustered, workstations are normally not because they are simple and cheap to replace. Each computer in the cluster is connected to the other computers in the cluster by high-speed, redundant links (usually multiple fiber-optic cables). Each computer runs special clustering software that makes the cluster of computers appear as a single entity to clients.
There are two levels of cluster service: failover and true.
1. Failover Clustering
A failover cluster includes two entities (usually servers). The first is the active device (the device that responds to network requests), and the second is the failover device. The failover device is an exact duplicate of the active device, but it is inactive and connected to the active device by a highspeed link. The failover device monitors the active device and its condition by using what is known as a heartbeat. A heartbeat is a signal that comes from the active device at a specified interval. If the failover device doesn’t receive a heartbeat from the active device in the specified interval, the failover device considers the active device inactive, and the failover device comes online (becomes active) and is now the active device.
When the previously active device comes back online, it starts sending out the heartbeat. The failover device, which currently is responding to requests as the active device, hears the heartbeat and detects that the active device is now back online. The failover device then goes back into standby mode and starts listening to the heartbeat of the active device again.
In a failover cluster, both servers must be running failover clustering software, such as Novell’s System Fault Tolerance, Level III (SFTIII), Standby Server and High Availability Server (with Novell’s High Availability software, either of the servers can fail and the other will take over), and Microsoft Cluster Server (MSCS) for Windows NT servers. This functionality is built into Microsoft Windows 2000 and later operating systems. Each software package provides failover functionality.
Here are some advantages of this approach to fault tolerance:
Even though Microsoft Cluster Server (MSCS) is described earlier as a failover clustering technology, it does have some capability for load balancing (according to Microsoft). It currently supports only a two-device configuration, so it primarily fits into this category of clustering.
True clustering differs from failover clustering in two major ways:
In true clustering (also called multiple server clustering ), multiple servers (or any network devices)act together as a kind of super server. True clusters must provide load balancing. For example, 20 servers can act as one big server. All network services are duplicated across all servers, and network requests are distributed across all servers. Each server is connected to the other servers through a high-speed, dedicated link. If one server in the cluster malfunctions, the other servers automatically take over the burden of the failed server. When the failed server comes back online, it resumes responding to requests as part of the cluster. This technology can provide greater than 99-percent availability for network services hosted by the cluster.
Several advantages are associated with true clustering:
But these advantages don’t come without a price. Here are a couple of disadvantages to true clustering:
In a warm site, the network service and data are available most of the time. The data and services are less critical than those in a hot site. With hot-site technologies, all fault tolerance procedures are automatic and are controlled by the NOS. Warm-site technologies require a little more administrator intervention, but they aren’t as expensive.
The most commonly used warm-site technology is a duplicate server. A duplicate server, as its name suggests, is one that is currently not being used and is available to replace any server that fails. When a server fails, the administrator installs the new server and restores the data; the network services are available to users with a minimum of downtime. The administrator sends the failed server out to be repaired. Once the repaired server comes back, it is now the spare server and is available when another server fails.
Using a duplicate server is a disaster recovery method because the entire server is replaced but in a shorter time than if all the components had to be ordered and configured at the time of the system failure. The major advantage of using duplicate servers rather than clustering is that it’s less expensive. A single duplicate server costs much less than a comparable clustering solution. Corporate networks don’t often use duplicate servers, and that’s because there are some major disadvantages associated with using them:
A cold site cannot guarantee server uptime. Generally speaking, cold sites have little or no fault tolerance and rely completely on efficient disaster recovery methods to ensure data integrity. If a server fails, the IT personnel will do their best to recover and fix the problem. If a major component needs to be replaced, the server stays down until the component is replaced. Errors and failures are handled as they occur. Apart from regular system backups, no fault tolerance or disaster recovery methods are implemented.
This type of site has one major advantage: It is the cheapest way to deal with errors and system failures. No extra hardware is required (except the hardware required for backing up). Any disadvantages of implementing a cold site would stem from having an application that cannot afford the downtime associated with service-affecting faults and disasters.
The term nearline refers to a storage method that is neither online nor offline but somewhere in the middle, like tape backup. It involves material that is not likely to be needed except in cases of disaster recovery. While there is not a one-to-one correspondence between any type of site (hot, warm, or cold) and nearline storage, which is not actively accessed during normal operation, you can see that nearline storage comes in handy when recovering from disasters in warm and cold sites.
Networking Related Tutorials
|Network Security Tutorial|
Networking Related Interview Questions
|Network Technical Support Interview Questions||Networking Interview Questions|
|CCNA Interview Questions||Network Security Interview Questions|
|Computer Network Security Interview Questions||Hardware and Networking Interview Questions|
|CCNP Interview Questions||Routing Protcol Interview Questions|
|CWNA (Certified Wireless Network Administrator) Interview Questions||Border Gateway Protocol (BGP) Interview Questions|
|Enhanced Interior Gateway Routing Protocol (EIGRP) Interview Questions||Virtual Private Network (VPN) Interview Questions|
|Controller Area Network (CAN bus) Interview Questions||Cisco Network Engineer Interview Questions|
|Storage Area Network Interview Questions||Network Troubleshooting Interview Questions|
Networking Related Practice Tests
|Network Technical Support Practice Tests||Networking Practice Tests|
|CCNA Practice Tests||Network Security Practice Tests|
|Computer Network Security Practice Tests||Hardware and Networking Practice Tests|
|CCNP Practice Tests||Routing Protcol Practice Tests|
|CWNA (Certified Wireless Network Administrator) Practice Tests||Border Gateway Protocol (BGP) Practice Tests|
|Enhanced Interior Gateway Routing Protocol (EIGRP) Practice Tests|
The Osi Model
Network Operating Systems
Wired And Wireless Networks
Wan And Remote Access Technologies
Network Access And Security
Fault Tolerance And Disaster Recovery
All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.