Subject: PC hangs on boot
Priority: High
Details:

When the customer boots the PC on site, it launches to the Windows desktop and hangs. Reboots don't fix it.

There were a few more details in the support ticket that Michael J received, but that covered the core of the problem. There were a few other problems, though. First, this was circa 2001, so doing any sort of remote tech support was already a challenge. Second: the "site" in question was an oil survey vessel currently off the coast of West Papua.

The PC in question was provided by Michael's company, and it ran software which operated equipment vital to the ship's mission (but not its operation). The software placed the PC into a kiosk mode, hiding the Windows OS interface. Without the PC, the crew were idling in the middle of the ocean with not much to do.

The first thing was to figure out a workaround, and much to Michael's surprise: "just wait longer" was the solution. If the sailors waited about 10 minutes or so, the computer would start responding and the software could be used. That at least took the immediate pressure off, but it didn't solve the problem.

Michael started digging into the architecture. The sensors were expected to be at a COM port- these COM ports could be physical ports, USB ports, or virtual ports living on a network-hosted port server. The problem with COM ports was that they don't report whether or not a device is connected- the main way to find out if a device is on the other end is to see if it's sending signals to the port.

Suspecting that it had to be something with that COM communication, Michael dug down into the communication layer of the application. That quickly took him to the constructor for the COM port interface object. The first thing that leapt out was a number of low-level OS calls to enumerate what COM ports were connected. In a constructor. That ran at application boot time. And was a blocking operation.

That had to be it, but when Michael ran the program, it took seconds to boot. That wasn't surprising, even on a 500MHz CPU, scanning COM ports takes milliseconds. But Michael's company left those network-hosted port servers running all the time. When he unplugged the network port server, the program hung at boot, just like on the boat.

From there, it was easy to understand what was happening. The virtual COM ports appeared to the OS like regular COM ports, and thus appeared to the program as regular COM ports. But they weren't, and they used the Windows networking stack. The stack helpfully would send packets to the port server and then wait 30 seconds for a reply. The port server, in most configurations, hosted 8-16 ports, but some configurations could have multiple port servers.

This allowed Michael to offer a simple fix to the boat: turn on the port server. This posed another problem, as the port server was located in someone's cabin, for whatever reason, and they found the blinkenlights kept them up at night. Some black electrical tape was the next fix.

The longer fix was a refactoring campaign to remove the COM port scanning from the class constructor, and instead populate it upon request using a separate thread so that the entire program didn't hang while waiting.

[Advertisement] Keep the plebs out of prod. Restrict NuGet feed privileges with ProGet. Learn more.