Blackhole routing comes about when a device in an IP data path (a 'router' although it may be called something else) drops packets without telling the originator. Blackholing can be done deliberately in response to a flood of messages (a Denial of Service attack). But it can and does occur from time to time as a result of device misconfiguration. I have encountered the latter three times now -- twice at work and once at home. I suspect that it is a lot more common than most IT people think. Blackhole routing results in really bizarre communication problems that are very difficult to troubleshoot and are often 'fixed' by some kludge without really understanding or correcting the real problem.
Blackhole routers typically come about when two pretty reasonable technology practices come into conflict. The first practice is using Path MTU Discovery (PMUD) which was defined in RFC1191. The second practice is turning off as many non-critical internet related services as possible in order to avoid hacking attacks on unanticipated defects such as buffer overflows in the software providing the services. Specifically, some system administrators turn the ICMP service off. The problem is that path MTU discovery depends on ICMP. If ICMP is turned off, the MTU can be misset. When a misset MTU is combined with the RFC1191 practice of setting the Do Not Fragment flag in IP packets, some packets may fall into a "black hole" -- simply vanish. Even worse, it is likely that the majority of IP packets -- especially those from older mainstream software will not blackhole. So the symptoms will be that certain programs -- and often only certain options in certain programs -- will fail. For example, TELNET, tracert, ping and Internet Explorer might work, but a Citrix client or VNC might fail over the same link.
Just a bit more detail.
RFC1191 addresses the fact that the maximum packet size is not the same for all devices and technologies. Thus, token ring networks permit very large packets (16K) but when throttled down to send over a phone line, the maximum packet size is likely to be 1.5K. Furthermore, "transparent" protocols like PPPoE (used by many DSL providers) or Virtual Private Networking grab a few more bytes out of each packet for their own flow control. Although most routers can fragment packets, that can result in a lot of overhead. If, for example, the originator assumes that packets will be 1500 bytes and the packets are sent thru a DSL provider that uses PPPOE, each 1500 byte packet will be converted to a 1492 byte packet and an 8 data byte plus many bytes of header packet. There would be many fewer packets to deal with if the originator knew the smallest packet size to be used in the path and just created packets that size or a bit smaller.
The old (pre-RFC1191) answer was to arbitrarily create 576 byte packets. The new answer is to determine the path MTU (Maximum Transmission Unit -- i.e. 'size') to each destination dynamically. This is done by sending a large ICMP (test) packet with the Do Not Fragment bit sent. Any router along the way that can not handle the packet is supposed to return an ICMP message saying that the packet is too big. The originator, then tries a smaller packet size and searches until it has found the largest packet size that can be used.
Since the path MTU could change at any time due to dynamic routing, systems using Path MTU Discovery need to set the Do Not Fragment flag on every message sent. That is supposed to ensure that when the path MTU changes (downward) a new maxium packet size is determined at the originator.
ICMP is an old and useful protocol. Mostly it is just used to test Internet connectivity. But any service open to the Internet is vulnerable to attack from the Internet. As a result, some system administartors turn ICMP off. One less thing to get them into trouble. That would probably be OK, were it not for the fact that ICMP is used by path MTU discovery. If ICMP is off at any router in the data path, packet size limitations at or 'beyond' that router will not be detected. Neither will the originator be informed that its data packets are being dropped beyond the non_ICMP capable node.
So, the result of combining RFC1191 and administrators shutting off ICMP is that there can be a size range where packets simply vanish into a black hole. This usually is not difficult to check for (although it may hard to fix). To check for black hole routing use the ping tool provided with all unixes and with Windows For Workgroups, Windows 9, and all NT based Windows. The syntax is different for Windows and Unix, and some older Unix pings can't be used because they do not allow the Do Not Fragment flag to be set. Anyway for Windows (where Ping is best run in an MSDOS window) the syntax is ping dest -f -l size, Where dest is the IP or URL of the destination and size is the packet size. For Unix, the syntax is ping dest -D -s size. The Version of ping shipped with Slackware 10.2 the '-D' flag seems to have morphed into '-M do'. '-M want' might be useful if you can figure out how to use it ... which I can't.
The process is simply to send a large packet -- bigger than can be handled without fragmenting. 20000 bytes should be way bigger than can be handled. That should produce a message "Packet needs to be fragmented, but DF set" -- and smaller packets that should go through. Iterate until the largest packet that gets a response and the smallest that needs fragmentation are known. If there is a size range between these where messages simply time out, there is a blackhole problem for packet sizes that time out on that route (MSDOS/Windows will tell you about time outs. Linux won't, but when you hit ctrl-C, there will be a non-zero number of packets sent, 0 received.).
Detecting such a problem is easy. Fixing it however ... Good Luck. The best bet would be to fix the offending router(s). This is likely not to be easy even if you control the device. If someone else controls the router, your chances of finding a support person who understands the problem, acknowledges that it is their problem, and can fix it are not especially good. You can try. Maybe you'll be lucky.
The next best choice turn off Path MTU detection and to manually set the MTU to a size that is known to be safe. In Unix, this might be as simple as feeding a parameter to ifconfig (I haven't tried it). In Windows, it requires tinkering with MTU settings in the Registry which is not especially fun in Windows 9 and is a lot less fun in NT. NT based windows has, in my (thankfully) limited experience, a ludicrous number of MTU related settings in the Registry. One easy trap to fall into is that NT defaults to using hex numeric values for MTU. Setting in decimal numbers without converting to hex will have VERY unsatisfactory results.
Here's a link to Cisco Tech Note 13709 -- Adjusting IP MTU, TCP MSS, and PMTUD on Windows and Sun Systems that purports to address most of these issues including which registry entries to edit. I have not (and do not intend to) validated the document, but it looks consistent with my experience with this stuff