This documentation is archived and is not being maintained.
Field Notes Where Did the Net Go?
R'ykandar Korra'ti, postmaster for a small co-op ISP, lives near Seattle with her partner Anna. Having previously shipped mail products at Microsoft, she is now looking at grad school in a CS-related field so esoteric it doesn't really have a name. Potential faculty advisors can reach her at email@example.com.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.
We first noticed something was wrong when our network fell over and died. No, wait; that makes it sound way more catastrophic than it actually was. Let me back up and start again.
I'm one of the operators of a small, co-op ISP in the Seattle area. We have a heterogeneous workstation environment: mostly Windows® and Linux on the server side, a variety of Macintosh OS X boxes, some Windows clients, and even one ancient Amiga 4000/040 that doesn't get turned on very often. I'm sure you can imagine the kinds of joy this brings to our lives.
One day, while sitting quietly reading comic books studying some useful manuals, I heard the words I had come to dread: "Dara! The net's down!" OK, I thought to myself, what does that actually mean? For a change, our network actually was down—or, at least, our outbound connectivity was. The LAN was fine and we could talk to our router at the UDP and ICMP levels. From the router, we could talk to the rest of the world. But the router itself no longer passed TCP packets. Stranger still, the interfaces for its two NICs both reported normal status and a perfectly reasonable number of packet errors. I stopped and restarted the interface driver on the internal NIC, saw that everything came right back up, and decided to investigate further—but later.
Six hours passed and it happened again, just as we were leaving for the night. I crawled through the router logs, found nothing interesting at all, and reset the card again, not having time for anything else.
We didn't even make it through the night. There still wasn't a hint in the server logs—it didn't even notice that the card went down, which made me remember something. Aha! I thought, with words that might best be described as "famous" and "last," I've seen this behavior before. The onboard TCP/IP checksum hardware has gone pear shaped, and it's time for a new card! For this, I was prepared; the machine was down, re-NICed, and back up in 15 minutes. Back to bed I went.
Come the morning and guess what—we're down again.
Sitting in front of the primary server cluster and installing a different network monitor to see whether a new tool might help, I noticed our central switching hub light up. And by "light up," I don't mean "ah, someone's streaming the new Doctor Who," I mean Times Square at New Year's lighting up—but only for a moment, and then things returned to normal. I swapped to the backup switch and waited for it to happen again—and when it did, we fell off the net.
Our backup switch has poor indicators on the front panel, which is why it became our backup switch to start with—so I swapped in an older, slower unit with a good display set. When it lit up the next time, I spotted the culprit: one of our Web servers. I didn't find anything out of the ordinary until I noticed the brief appearance of an anomalous script in the task list, before it vanished and reappeared under another name.
After a little research and a lot of network sniffing—my goodness, those are a lot of SYN packets, and that's a very interesting login to a Japanese IRC server—I found we'd been hit by someone's exploit of a newly discovered PHP4 vulnerability for which there was not yet even a patch. Our Web server had become a bot in someone's round of Mixi wars—or had tried.
The funny part was that our router apparently had wanted no part of that game. Every time it got hit with the malformed SYN flood, the "experimental" (read: flaky) NIC driver on our router simply decided to go off on its own and sulk. This meant that the DDOS attack, at least on our end, was failing to DDOS its objective—but instead was successfully DDOSing itself. Like a poor marksman, it just. Kept. Missing. The Target.
A few days later—after a wipe and restore of the Web server, with PHP offline—we got a patch and were back to normal. But we kept the router just the same. Apparently, it's smarter than we are, and I'm definitely not going to mess with that.
Besides, if we tried anything—it might retaliate