Windows Server 2000 “just works” … no matter what you do
You’ve heard of layer eight problems? Well, here’s what I call a “split layer eight” issue.
A couple of weeks ago, the company I work for finished up moving servers to a new data center; part of this move involved changing some servers’ IP addresses. Since the move, one of our servers had been behaving slightly erratically. It was running SQL Server on Windows Server 2000, and had one linked server (the main database server, which was local, even connected to the same switch) so stored procedures could query the other server’s databas, and whathaveyou. Small queries (short connections) would work just fine, but anything that took longer than a few seconds broke with “General network error”; the connections appeared to get interrupted.
I tried testing the connection between servers using iperf; the connection for that initially worked, but then got interrupted after a few seconds, too! The same test was okay between the main database server and a different server, so that narrowed the issue down to our strangely-behaving server or its switch port.
Hardware? Switch port reports no errors, so it’s probably not that …
Duplex mismatch? Both switch and server report autonegotiated 100Mbps/full-duplex, so that’s not the problem …
Network configuration? Well, let’s check just for the hell of it.
…
/26 subnet mask?
Let me explain something to everyone before I go any further. IP addresses are 32-bit numbers, and they’re divided up into sections using bit masks, called “subnet masks”. For example, I have 192.168.1.0/24 (24-bit subnet mask, which means I can use any number in the last octet, the .0 in the address), and I want to divide it up into four segments. I would use a /26 subnet mask to divide the address space (the last octet) up like this:
Block 1: 0-63 | Block 2: 64-127 | Block 3: 128-191 | Block 4: 192-255
Essentially, this means, for example, that 192.168.1.1 cannot directly talk to 192.168.1.152, because they each know they are in different blocks (.1 is in block 1, .152 is in block 3), and have to talk to an intermediary (a router) in order to reach the other.
Now, back to my server’s network configuration. My server was .152, and it had a /26 subnet mask, so the router, the central database server, and almost everything else it needed to talk to were not in that network block. Certainly, everything I had tested from was outside of that block.
Why was I even able to connect!? Instead of doing what network standards suggest, and simply returning a “no route to host” error, Microsoft’s networking stack was apparently ignoring the subnet mask, allowing the connection, and then later realising what the subnet mask was and dropping the connection … (?!)
Even if the above was not the case, and something very weird was going on, and Microsoft’s networking stack is flawless, I should have been presented with an error when I input a default router address that was not in the same subnet as the server. By definition, one’s default gateway must be in the same subnet!
This very simple problem should have never happened, obviously, but all three links in the chain failed:
- I, the user, failed to input the correct information.
- Microsoft failed to recognise that the information I input was inherently incorrect and could not possibly work under any circumstances.
- Microsoft’s networking stack failed by happily taking the horribly wrong information and somehow worked anyway … marginally. Guys, guys — you’ve got entirely the wrong idea of “just works”.
This is a split layer eight issue, because the fault is split between the user and the developer. I am the first to apologise to users for failures to do proper input validation, or failures to work as expected. Developers can do wrong, and frequently fuck their users by failing in this manner; most users only recognise the “user error” portion of it, unfortunately, and continue using the software.
Microsoft have touted their commercial software as “best in the business” because they have spent billions on development; they have said that their billions spent make their expensive products cost-saving. I ask you, where have those billions gone when such a simple error can be made by the user? Aren’t they supposed to be making the experience easier and more efficient? Think about how many times similar situations have probably happened around the world; either extra time was spent, or extra money was thrown at an outside consultant to come in to identify the issue. Forget how much they’ve “saved” us; how many billions of dollars have Microsoft cost the IT industry?
My Linux and FreeBSD boxes don’t do this shit. If I change my subnet mask to something that puts my default gateway in another subnet, my default route gets deleted. If I try to connect to something outside of my subnet, I get a “no route to host” error; the connection doesn’t go through and then get interrupted. THIS IS FREE SOFTWARE — DEVELOPED BY “AMATEURS”. WHAT THE FUCK ARE YOU DOING, MICROSOFT?
Somehow, I don’t think Microsoft includes “simple errors caused by developer fuckups” in their estimates of total cost of ownership.
February 15th, 2008 at 15:21
I understand what you mean. I got to ITT Tech. do you understand?
/me is josh’s friend
props +1