Thursday, October 12, 2006

Zope.org DNS Post-Mortem

Wow, so, apparently, I am a complete ignoramus. That's what it means when you crown a volunteer as responsible for something you don't want to handle, and it goes wrong, right?

I dunno. I won't put it past myself to make a mistake, but what went wrong with Zope.org this week was odd - it was exactly one of the things I worked pretty hard to avoid. All of the records with a certain IP address were hosed and had the exact same incorrect address.

If I stumbled on this for a client or in the course of helping someone else, I would probably assume that they made a mistake with the mouse, and that may be just what I did. However, I remember pretty clearly making a point to individually copy the contents of each record from the BIND zone-file I was sent into ZoneEdit, whom we are using in order to implement shared access to the zone. I made a point to copy each record even when the IP address was the same, and I visually verified that all was the same multiple times. Unfortunately, ZoneEdit's import functionality for zone-file is broken, so I had to create about 30+ records by hand, one at a time. It took a while.

Anyway, what happened was that a bunch of people not only got records for an address ending in .1 instead of .171, but a lot of companies and providers cached them locally. Bad news, because the records last a while, and this means that people couldn't get to various parts of zope.org - espescially cvs/svn and the main website. Murphy's fucking law. Almost nothing else was even a hair off, and most of the other hostnames are probably not even used anymore.

In any case, apparently this was most certainly all my fault, no less because I wanted us to set up a group of six to ten servers grabbing zone transfers from zoneedit, so that zope.org would never be at risk of, ehm, going down for a week.

I also probably wasn't vocal enough about it, because everything seemed to work OK, but I wouldn't have recommended doing this in a week or so of time. Moving zope.org is a pretty big deal. Sure, I spent about two days on my own zones, and they went up and down like crazy, but only about five people care about that, and they don't care very much. I used the experience to help avoid several days of downtime for some client zones which I also moved to ZoneEdit, and also, despite the current pains of zope.org, to avoid several days of downtime.

As we'd been heading, we probably would have switched directly to ZoneEdit, probably noone would have tried to test that they were even serving the zone, and for 2-3 days the entire zone, including mail and mailing lists, would have been down. At least, that's what I think. That's what happened to my zones which I moved more or less in this fashion. ;)

So yeah, I'm done. I'm tired of being blamed. Maybe I made a typo, didn't double check, and fucked a bunch of people all week. I sure as hell feel like it, but if I didn't, you guys are definitely assholes.

Fuck you very much.

No comments: