« Hints On How To Learn | Main | INE All-Access Pass »

Router Hangs

During my studies, I ran across Document ID: 15105 - Troubleshooting Router Hangs.

The document divides this issue into two broad categories:
- When the console does not respond

- When traffic does not go through

Of course, they cover checking the CPU and seeing if some runaway process is hogging the resources, etc. This is normal stuff that we would generally try almost out of habit. However, the article went further by outlining:

# scheduler commands
This is something to check if traffic is still flowing through the router, but the console is unresponsive.

On the Cisco 7200 and 7500 Series:

configure terminal
scheduler allocate 3000 1000
^Z

The scheduler allocate command guarantees CPU time for low priority processes. It puts a maximum time allocated to fast-switching (3000 microseconds - usec) and process-switching (1000 usec) per network interrupt context.

On all other platforms, use:

configure terminal
scheduler interval 500
^Z

The scheduler interval command allows low priority processes to be scheduled every 500 usec, and thereby allows some commands to be typed even if CPU usage is at 100%.

And then the other case - when the console is responsive and traffic does not pass through. Of course, it could be a routing issue, but those are the usual things that we know to look for in this instance. However, the interesting part of this article goes further to describe:

# Wedged interfaces and how to obtain a stack trace


Wedged interfaces – This is a particular case of buffer leaks that causes the input queue of an interface to fill up to the point where it can no longer accept packets. Reload the router. This frees that input queue, and restores traffic until the queue is full again. This can take anywhere from a few seconds to a few weeks, based on the severity of the leak.

The easiest way to identify a wedged interface is to issue a show interfaces command, and to look for something similar to this:

Output queue 0/40, 0 drops; input queue 76/75, 27 drops


Obtain a Stack Trace from ROM Monitor

K-trace refers to the procedure used to obtain a stack trace from the router from ROM Monitor. On routers with older ROM Monitor code, a stack trace is obtained with the k command. On routers that run more recent ROM Monitor code, the stack command can also be used.

Complete these steps to obtain stack traces from a router that does not respond:

1.Enable the break sequence. For this, change the configuration register value. The eighth bit value must be set to zero so that break is not ignored. A value of 0x2002 works.

Router#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
Router(config)#config-register 0x2002

2.Reload the router so that the new configuration register value is used.

3.Send the break sequence when the problem occurs. The ROM Monitor prompt ">" or "rommon 1 >" must be displayed.

4.Capture a stack trace. For this, collect the output from either the k 50 or stack 50 commands. Add 50 to the command to print a longer stack trace.

5.Issue the c or cont command to continue.

6.Repeat the three last steps several times to ensure that multiple points in a continuous loop have been captured.

7.After you have obtained several stack traces, reboot the router to recover from the hung state.

Here is an example of this procedure:

User break detected at location 0x80af570

rommon 1 > k 50

Stack trace:
PC = 0x080af570
Frame 00: FP = 0x02004750 RA = 0x0813d1b4
Frame 01: FP = 0x02004810 RA = 0x0813a8b8
Frame 02: FP = 0x0200482c RA = 0x08032000
Frame 03: FP = 0x0200483c RA = 0x040005b0
Frame 04: FP = 0x02004b34 RA = 0x0401517a
Frame 05: FP = 0x02004bf0 RA = 0x04014d9c
Frame 06: FP = 0x02004c00 RA = 0x040023d0
Frame 07: FP = 0x02004c68 RA = 0x04002e9e
Frame 08: FP = 0x02004c78 RA = 0x040154fe
Frame 09: FP = 0x02004e68 RA = 0x04001fc0
Frame 10: FP = 0x02004f90 RA = 0x0400c41e
Frame 11: FP = 0x02004fa4 RA = 0x04000458
Suspect bogus FP = 0x00000000, aborting

rommon 2 > cont

Repeat this procedure several times in the event of a system problem to collect multiple instances of the stack trace.

When a router does not respond, it is almost always a software problem. In this case, collect as much information as possible, including the stack trace, before you open a TAC service request. It is also important to include output from the show version, show run, and show interfaces commands.

Sections

Powered by
Movable Type 3.2