His iPod is another precious gadget; a passionate lover of all kinds of music, particularly Indian classical — his wife Mala being a classical vocalist — he has over 1,000 songs on his
iPod. He is plugged into it during travel, weekends, early morning or late evening. Then there is, of course, the music system at home... “the best there is of Bose.” And, a Nikon D80, “with which I love photographing wildlife.” He has done so at Kaziranga, Pench, Ranthambore, Corbett and Nagarahole wildlife parks. Narrowing Down the Problem
Troubleshooting a network problem can be daunting. That’s why it’s best to start by trying to narrow down the source of the problem. When troubleshooting, there are five questions you should ask yourself:
- Did you check the simple stuff?
- Is hardware or software causing the problem?
- Is it a workstation or server problem?
- Which segments of the network are affected?
- Are there any cabling issues?
Did You Check the Simple Stuff?
The first thing to check, as most people will tell you, is the simple stuff. There’s a saying that goes “All things being equal, the simplest explanation is probably the correct one.” For computers, it’s rather hard to categorize simple stuff because what’s simple to one person might be complex to another. We like to define simple stuff (as it relates to troubleshooting) as items that are so obvious that you don’t think to check them. When it turns out that one of those items is the problem, your reaction is almost always “Duh!” Almost everyone can agree on a few items that fall into this category:
- Correct login procedure and rights
- Link lights/collision lights
- Power switch
- Operator error
The Correct Login Procedure and Rights
To gain access to the network, users must follow the correct login procedure exactly. If they don’t, they will be denied access. Considering everything that must be done correctly and in the correct order, it’s a miracle that anyone logs in to a network correctly at all. There are so many opportunities for making a mistake.
First, a user must enter the username and password correctly. As easy as this sounds, users frequently enter this information incorrectly, don’t realize it, and report to the network administrator that the network is broken or that they can’t log in. The most common problem is accidentally typing the wrong username or password. In some operating systems, this can happen when you accidentally leave the Caps Lock key pressed. An example of this is Unix, in which passwords are case sensitive; the user will not be able to log in unless their password is in all uppercase letters.
Additionally, in NetWare and Windows, the network administrator can restrict the times and conditions under which users can log in. If a user doesn’t log in at the right time or from the right workstation, the network operating system will reject the login request even though it might be a valid request in terms of the username and password being spelled correctly. Additionally, a network administrator might restrict how many times a user can log in to the network simultaneously. If that user tries to establish more connections than are allowed, access will be denied. Any time a user is denied access to the network, they are likely to interpret that as a problem even though the network operating system might be doing what it should.
To test for these types of problems, first check to see if the username and password are being typed correctly and whether or not the Caps Lock key is pressed. Try the login yourself from another workstation (assuming that doesn’t violate the security policy). If it works, you might try asking the user to check to see if the Caps Lock light on the keyboard is on (indicating that the Caps Lock key has been pressed). If that doesn’t solve the problem, check the network documentation to see if the aforementioned kinds of restrictions are in place.
The Link and Collision Lights
The link light is a small light-emitting diode (LED) found on both the NIC and the hub. It is typically green and is labeled “link” (or some abbreviation). A link light indicates that the NIC and hub (in the case of 10Base-T) are making a logical (Data Link layer) connection. You can usually assume that the workstation and hub are communicating if the link lights are lit on both the workstation’s NIC and the hub port to which the workstation is connected.
The collision light is also a small LED, typically amber in color. It can usually be found on both Ethernet NICs and hubs. When lit, it indicates that an Ethernet collision has occurred. It is important to know that this light will blink occasionally because collisions are somewhat common on busy Ethernet networks. However, if this light stays on continuously, there are too many collisions happening for legitimate network traffic to get through. This can be caused by a malfunctioning network card or another malfunctioning network device.
The Power Switch
To function properly, all computer and network components must be turned on and powered up. As obvious as this is, network administrators often hear a user complain, “My computer is on, but my monitor is dark.” In this case, our response is to ask, “Is the monitor turned on?” After a pause, the voice on the other end usually says sheepishly, “Oh. Thanks.”
Most systems include a power indicator such as a Power or PWR light, and the power switch typically has a 1 or an On indicator. However, the unit could be powerless even if the power switch is in the On position. Thus, you need to check that all power cables are plugged in, including the power strip.
When troubleshooting power problems, start with the most obvious device and work your way back to the power service panel. There could be any number of power problems between the device and the service panel, including a bad power cable, bad outlet, bad electrical wire, tripped circuit breaker, or blown fuse. Any of these items can cause power problems at the device.
The problem may be that the user simply doesn’t know how to perform the operation correctly; in other words, the problem may be due to operator error (OE). Those in the computer and networking industry have devised several colorful expressions to describe operator error:
- EEOC (Equipment Exceeds Operator Capability)
- PEBCAK (Problem Exists Between Chair And Keyboard)
- ID Ten T Error (written as ID10T)
Assuming that all problems are related to operator error, however, is a mistake. Before you attribute any problem to operator error, ask the user to reproduce the problem in your presence, and pay close attention. You may find out that the user is having a problem because they are using an incorrect procedure—for example, flipping the power switch without following proper shutdown procedures. You may also find out that the user was trained incorrectly, in which case you might want to see if others are having the same difficulty. If the problem and solution are not obvious, try the procedure yourself, or ask someone else at another workstation to do so.
Is Hardware or Software Causing the Problem?
A hardware problem typically manifests itself as a device in your computer that fails to operate correctly. You can usually tell that a hardware failure has occurred because you will try to use that piece of hardware and the computer will issue an error indicating that this has happened. Some failures, such as hard-disk failures, may give warning signs—for example, a Disk I/O error or something similar. Other components may just suddenly fail. The device will be operating fine and then simply fail.
The solution to hardware problems usually involves either changing hardware settings, updating device drivers, or replacing hardware. As we have discussed in previous chapters, I/O address, interrupt request lines (IRQ), and direct memory access (DMA) conflicts can cause computers (including workstations and servers) to malfunction. Change the hardware settings to solve these types of problems.
If the hardware has actually failed, however, you must get out your tools and start replacing components. If this is not one of your skills, you can send the device out for repair. In either case, because the system can be down for anywhere from an hour to several days, it’s always prudent to have backup hardware on hand.
Software problems are a little more evasive. Some problems might result in General Protection Fault messages, which indicate a Windows or Windows program error of some type. Also, a program might suddenly stop responding (hang), or the entire machine might lock up randomly. The solution to these problems generally involves a trip to the manufacturer’s support website to get software updates and patches or to search for the answer in a knowledge base.
Sometimes software will give you a precise message regarding the source of the problem, such as the software is missing a file or a file has become corrupt. In this case, you can either provide the file or, if necessary, reinstall the software. Neither solution takes long, and your computer will be up and running in a short time.
Is It a Workstation or a Server Problem?
Troubleshooting this problem involves first determining whether one person or a group of people are affected. If only one person is affected, think workstation. If several people are affected, the server or, more generally speaking, a portion of the network is probably experiencing problems. If a single user is affected, your first line of defense is to try to log in from another workstation within the same group of users. If you can do so, the problem is related to the user’s workstation. Look for a cabling fault, a bad NIC, or some other problem.
On the other hand, if several people in a group (such as a whole department) can’t access a server, the problem may be related to that server. Go to the server in question and check for user connections. If everyone is logged in, the problem could be related to something else, such as individual rights or permissions. If no one can log in to that server, including the administrator, the server may have a communication problem with the rest of the network. If it has crashed, you might see messages to that effect on the server’s monitor or the screen might be blank, indicating that the server is no longer running. These symptoms vary among network operating systems.
Which Segments of the Network Are Affected?
Making this determination can be tough. If multiple segments are affected, the problem could be a network address conflict. As you may remember from Chapter 4, “TCP/IP Utilities,” network addresses must be unique across an entire network. If two segments have the same IPX network address, for example, all the routers and NetWare servers will complain bitterly and send out error messages, hoping that it’s just a simple problem that a router can correct. This is rarely the case, however, and thus the administrator must find and resolve the issue. Also keep in mind that the continuous broadcasting of error messages will negatively impact network performance.
If all users of the network are experiencing the problem, it could be related to a different device, such as a server that everyone accesses. Or, a main router or hub could be down, making network transmissions impossible.
Additionally, if the network has WAN connections, you can determine if a network problem is related to the WAN connection by checking to see if stations on both sides can communicate. If they can, the problem isn’t related to the WAN. If they can’t communicate, you must check everything between the sending station and the receiving one, including the WAN hardware. Usually, the WAN devices have built-in diagnostics that can indicate whether the WAN link is functioning correctly to help you determine if the fault is related to the WAN link or to the hardware involved.
Are There Any Cabling Issues?
After you determine whether the problem is related to the whole network, to a single segment, or to a single workstation, you must determine whether the problem is related to network cabling. First, check to see if the cables are properly connected to the correct port. More than once, we’ve seen a wall phone cable plugged into a modem in the In jack.
Additionally, patch cables from workstation to wall jack can and do go bad, especially if they get moved or tripped over often. This problem is often characterized by connection problems. If you test the NIC and there is no link light (discussed earlier in this chapter), the problem could be related to a bad patch cable.
It is also possible to have a cabling problem in the walls where the cabling wasn’t installed correctly. If a network cable was run over a fluorescent light, for example, the workstation attached to that cable might have problems only when the lights are on. The problem is that the fluorescent lights produce a large amount of electromagnetic interference (EMI) and can disrupt communications in that cable. This kind of problem may manifest itself only at times when most lights need to be on.
Next, check the medium dependent interface/medium dependent interface-crossover (MDI/ MDI-X) port setting on small, workgroup hubs, a potential source of trouble that is often overlooked. This port is used to uplink, for example, to a hub on the network’s backbone. The port setting has to be set to either MDI or MDI-X, depending on the type of cable used for the hub-to-hub connection. A crossover cable (discussed later in this chapter) requires that the port be set to MDI; a standard network patch cable requires that the port be set to MDI-X. You can usually adjust the setting via a regular switch or a dual inline package (DIP) switch. Check the hub’s documentation.