BlueWolf's Howl

« It's a Network Problem | Bluewolf's Howl | More Wishes Granted »

February 04, 2008

Trace Schmace

Okay - this is really my own fault. You know how they say 'be careful what you wish for'... Well, I was silly enough to want to learn how to read traces. Yeah, I read the Ethereal book a very long time ago. But I never got the opportunity to practice the skill. And everyone can pick out the real obvious stuff (tons of retransmissions). I wanted to *really* know how to catch the stuff that goes unnoticed by the casual observer.

I'm still getting that wish. Now I wish it would stop for a while and give me a breather.

It started when the client asked for a network trace to figure out some issue with a file server. It was real obvious that the problem was connectivity between our site and the remote site. But they wanted to make sure and get a trace at the server. The 'network sniffer' that *used to* be there was there no longer. I got the cavalier 'put Wireshark on a laptop' answer when asked for tools. So I did. But I read Ethereal - this is a bit more advanced. So I whinned until they bought the Wireshark University CDs. I made it through part of the first CD.

No sooner was it in my hands then someone *else* needed to do a trace. The CDs were handed over (reluctantly). Then they had to be handed back. I'd really like to get through those -- however I'm so busy doing the dang traces and analyzing them, that I haven't had time to finish watching them. [They're really good, btw.]

So here I am, muddling through - learning by the seat of my pants. Again. I'm fighting my way through capturing a good trace. Apparently Wireshark gets lonely when you leave it running and spits out an error and stops. However, if you're sitting there watching it, more often than not, it won't error. Yeah, I've seen a lot of "Out Of Memory" error screens.

To save you some time - there's really no rhyme nor reason for the errors. Well, none that you can anticipate. I had someone (the guy I had to surrender the CDs to grrrr) tell me that I was taking too large of a capture file. The file for this particular trace was much smaller than traces I took previously. Sometimes (not all the time, but sometimes) Wireshark doesn't like to roll over a log. We set the file size to 50MB and it was choking - having to truncate and start new files waaaaay too often. So I set the file size to 500MB and got the capture. Navigating around in the resultant file wasn't anything resembling quick, though. Sometimes I can capture and watch it as it's going -- as long as I turn off the name resolution. Sometimes I can't and just have to wait for the capture to finish. Mostly I find that there's so much hand-holding necessary while the capture is running (for the requestor of the capture), that I really don't have time to sit and watch it. At the same time, they want some kind of generalized 'eyeball' of the results as the capture is still running. Yeah, not immediately after it's done -- AS IT'S RUNNING.... So it's helpful if you can view the capture in real time.

File size and the ability to open the file have nothing in common that I can tell. I've seen huge files open without a problem (over 700MB) and small files (about 50MB) choke on opening. [From what I saw in my travels - it may have more to do with the amount and type of metadata collected rather than total file size.]

Here is where I met my good friend Editcap. It's a command-line utility that will chop up your trace file into smaller files that are easier to manage (or open). I figured out how to use it by necessity. I was getting the Out Of Memory error. I doubled the RAM in the laptop. I increased the page file size. I still had problems. So I forced myself to figure out editcap and chopped up my captures.

When you chop up your captures, sometimes you can end up with a TON of files. The six large files (for about an hour's worth of traffic) ended up as 109 little files. Yes, and each one opened easily. However, now I had to open each one! A lot of packets is a lot of packets no matter how you slice them. [I put them in 50,000 packet files.]

Now my next task is to figure out how to merge the files and pull out some information. I know there's a command line utility to do this. I know there's a way to perform a network baseline (from a large amount of data). I saw it in passing while I was learning how to chop up my file. When I get a chance (read: when it becomes necessary due to a request), I will probably have to learn that on the fly too. Good thing I got those training CDs! Now if I can get a chance to view them -before- I have to accomplish those tasks, it would be a really nice thing.

Oh...and if you're using Secure Remote on the computer used to capture the file -- don't. You have to not only turn off Secure Remote, but also uncheck the box in the protocol properties. Otherwise, you'll end up with only one side of every conversation. Oh, you'll get traffic in both directions, but only one side of each conversation. Yeah, it kind of *looks* like you got stuff going in both directions, but when you try to follow the TCP stream... oooops. Yeah. Appearances can be deceiving.

Other helpful tidbits:
When you put a connection on a switch to mirror the port, make sure the speed and duplex match. Don't cause more errors from tapping the connection. You'll surely find plenty of retransmissions to blame your problem on, however none of them will be the original cause of the issue.

Don't let the large amounts of any bunch of packets fool you. Sure you can sort for broadcasts and multicasts and see a bunch of them. But how many are there in relation to the total number of packets in the capture? Perspective is everything.

Look at the traffic in several different ways. If you're doing a trace, this is probably a tricky problem that doesn't have an obvious solution. Network traffic is bursty by nature. Your average over a long period of time may hide a five minute burst that pegs out the bandwidth. [This will be most noticable if you're watching the capture as it happens.]

Coordinate with the users experiencing the problem. You don't want to waste your time capturing data when the trouble traffic is not on the wire. As you capture, have them do whatever it is that they do when they see the symptoms. Watch in real-time as they attempt to re-create the problem. You may notice something as it happens that would be harder to detect after the fact.

Learn all you can about the protocol that you're trying to analyze. It will be hard to know what's amiss if you don't know what you should expect to happen. You may have to do some research (maybe even fast and dirty on the fly in many cases), but try to find out as much as you can beforehand. I was surprised to find out that SQL traffic isn't going to be listed as SQL. It's known as TDS (Tabular Data Stream) traffic. This was very helpful to know and I was lucky to have stumbled across it during my research pre-capture.

Yeah - sometimes I just get lucky like that. I spent the first 6 to 9 months here without any requests for network traces. Then I got 7 in a two month time period. And it seems like that's their new solution to everything - do a trace. Not a one of them has any clue as to how much information that gathers or how much time it takes to sift through all this info. And I don't have a clue as to how much I'm learning from all this. Perhaps eventually I'll get that wish.

Posted by BlueWolf on February 4, 2008 10:22 PM