[ome-devel] cluster support
Ilya Goldberg
igg at nih.gov
Thu Nov 23 19:22:42 GMT 2006
Hi Jeremy
On Nov 20, 2006, at 1:36 PM, Jeremy Muhlich wrote:
> What is the state of cluster support in OME? Our university
> provides a
> cluster managed by LSF and a shared webhosting system, and I'm
> trying to
> figure out how much of their shared webhosting infrastructure I can
> leverage for OME.
>
> The main sticking point is that the webserver box isn't supposed to do
> too much real work itself. Compute-intensive jobs are supposed to be
> submitted to the cluster for (asynchronous) processing. I'm hoping I
> can get mexes running solely on the cluster nodes without too much
> modification to the existing code. Has anyone tried anything like
> this
> before, or is there at least a sense of how hard it would be to
> make the
> necessary modifications?
We're getting our mini-cluster up and running - I think its actually
running now (Tomasz and Josiah will know the latest status).
An overall problem with many cluster managers is that they don't
maintain state. We get around this by having the Apache process
maintain state - basically the perl interpreter with pre-compiled
modules and the MATLAB engine. Latency gets very very bad unless
these things are pre-loaded and kept in RAM while they crunch on
modules. MATLAB can take a second or two to startup for example,
which can approach 50% of the total execution time.
So the way the OME cluster is set up is that every node is running
Apache. The master node issues requests that include remote DB
connection info and job info. The worker node establishes a DB
connection, returns an OK message (to unblock the master), then
continues processing the request. When its done, its supposed to
issue an IPC message using the DB driver, but this bit hasn't been
working well recently. Anyway, the master doesn't wait around
forever for the IPC "finished" message, so things continue cranking
along fairly well. The only effect seems to be that the master gets
loaded a little more than it should be.
>
> Also, is the image server more cpu bound or I/O bound?
Definitely IO bound. It could start hitting the CPU if you request
lots and lots of rendered planes rather than raw data for analysis,
but its probably IO bound even then.
> I could
> technically have every call to the omeis cgi scheduled on the cluster,
> but the job dispatch delay can be up to 30 seconds which pretty much
> kills interactivity. Would omeis play nice on an otherwise
> lightly-loaded shared webserver, or might it soak up too much cpu
> time?
It soaks up a lot of RAM because it uses it as cache and shared
memory. The RAM is "loosely" allocated using mmap, so other demands
on the RAM will basically cause the kernel to do a lot of RAM
shuffling. Linux is remarkably efficient at this, while OS X is
remarkably sucky (even on identical hardware).
There's very little CPU used while running omeis. For all intents
and purposes its all IO.
There's no point at all to running it on a CPU cluster. Its the
drives you want to cluster (RAID) - not the CPU. Also, in an imaging
application, everything revolves around the image, so this is
definitely not the place to introduce latency needlessly.
Our mini-cluster is basically 16 opteron cores in 4 boxes. One box
does omeis, back-end ("master"), DB server, web UI and two worker
nodes. The other boxes all have one worker node per core. This is
for an application heavy on the analysis. If you want something
heavy on concurrent users with not as much analysis, then you'd want
to spread out the various services, in other words keep omeis, DB,
web-ui and back-end, and any analysis worker nodes all on separate
boxes. You can scale concurrency further by setting up load
balancing for Apache and Postgres, thereby giving even more cores/
boxes to each separate service, though this is getting out of my league.
-I
>
>
> -- Jeremy
>
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
More information about the ome-devel
mailing list