Using Idle Time for Calculations

A while back, I was working on using idle time on Microsoft Windows machines to carry out physics calculations. There was a problem though. The calculations are normally carried out under Linux/BSD (*nix platforms). Rather than porting the code, we decided to try using a virtual machine like approach based on this coLinux/Condor project.

This project is a good start, but it is a little out of date. For example, the current version of coLinux does not require WinPcap if you only use a few fixed ports for communication. Related to this, the project fails to mention that you need to set a dependency for WinPcap in the registry if you want the coLinux system to run as a Microsoft Windows service (start at boot).

If you install the disk image from the project, you'll find it's a pretty vanilla install of an old Fedora Core release. This is a double edged sword. The vanilla install makes it easy to customize, but it also means that the coLinux system is running some pointless services like boot hardware detection, bluetooth support, PCMCIA support, etc. None of these are needed for virtualized hardware (it never changes). Also, as I mentioned, it is an old release so many people will want to update it.

But what if you don't want to update? What if you want to start fresh with your favorite Linux distro? The information to do this is not easy to find. (The short form is: install the Linux distro somewhere, image that device, and strip the first 63 512-byte blocks from the raw image. It works with VMware Server flat disk images in my experience.)

If you need MPI, you need to use a bridged network (WinPcap or TAP) for the coLinux machine. MPI, at least OpenMPI which I use, assigns random port numbers for communication. This means that coLinux can be used for MPI, but the virtual machines can't be running a firewall -- a security no-no.

By this point, you might be wondering why start with this project at all. There is a simple reason: it works. It took some digging, but I was able to install a dozen Microsoft Windows machines with coLinux and TORQUE (I didn't like Condor). The coLinux cluster with MPI then ran fine. Of course, jobs did fail if users rebooted one of the active nodes, but that can be dealt with.

Oh, a final note: coLinux at this time doesn't support multiple processors/cores so you won't be able to get all the processing power using this method.