[ome-devel] cluster support

Ilya Goldberg igg at nih.gov
Thu Nov 23 19:22:42 GMT 2006


Hi Jeremy

On Nov 20, 2006, at 1:36 PM, Jeremy Muhlich wrote:

> What is the state of cluster support in OME?  Our university  
> provides a
> cluster managed by LSF and a shared webhosting system, and I'm  
> trying to
> figure out how much of their shared webhosting infrastructure I can
> leverage for OME.
>
> The main sticking point is that the webserver box isn't supposed to do
> too much real work itself.  Compute-intensive jobs are supposed to be
> submitted to the cluster for (asynchronous) processing.  I'm hoping I
> can get mexes running solely on the cluster nodes without too much
> modification to the existing code.  Has anyone tried anything like  
> this
> before, or is there at least a sense of how hard it would be to  
> make the
> necessary modifications?

We're getting our mini-cluster up and running - I think its actually  
running now (Tomasz and Josiah will know the latest status).
An overall problem with many cluster managers is that they don't  
maintain state.  We get around this by having the Apache process  
maintain state - basically the perl interpreter with pre-compiled  
modules and the MATLAB engine.  Latency gets very very bad unless  
these things are pre-loaded and kept in RAM while they crunch on  
modules.  MATLAB can take a second or two to startup for example,  
which can approach 50% of the total execution time.

So the way the OME cluster is set up is that every node is running  
Apache.  The master node issues requests that include remote DB  
connection info and job info.  The worker node establishes a DB  
connection, returns an OK message (to unblock the master), then  
continues processing the request.  When its done, its supposed to  
issue an IPC message using the DB driver, but this bit hasn't been  
working well recently.  Anyway, the master doesn't wait around  
forever for the IPC "finished" message, so things continue cranking  
along fairly well.  The only effect seems to be that the master gets  
loaded a little more than it should be.

>
> Also, is the image server more cpu bound or I/O bound?

Definitely IO bound.  It could start hitting the CPU if you request  
lots and lots of rendered planes rather than raw data for analysis,  
but its probably IO bound even then.

>   I could
> technically have every call to the omeis cgi scheduled on the cluster,
> but the job dispatch delay can be up to 30 seconds which pretty much
> kills interactivity.  Would omeis play nice on an otherwise
> lightly-loaded shared webserver, or might it soak up too much cpu
> time?

It soaks up a lot of RAM because it uses it as cache and shared  
memory.  The RAM is "loosely" allocated using mmap, so other demands  
on the RAM will basically cause the kernel to do a lot of RAM  
shuffling.  Linux is remarkably efficient at this, while OS X is  
remarkably sucky (even on identical hardware).
There's very little CPU used while running omeis.  For all intents  
and purposes its all IO.
There's no point at all to running it on a CPU cluster.  Its the  
drives you want to cluster (RAID) - not the CPU.  Also, in an imaging  
application, everything revolves around the image, so this is  
definitely not the place to introduce latency needlessly.

Our mini-cluster is basically 16 opteron cores in 4 boxes.  One box  
does omeis, back-end ("master"), DB server, web UI and two worker  
nodes.  The other boxes all have one worker node per core.  This is  
for an application heavy on the analysis.  If you want something  
heavy on concurrent users with not as much analysis, then you'd want  
to spread out the various services, in other words keep omeis, DB,  
web-ui and back-end, and any analysis worker nodes all on separate  
boxes.  You can scale concurrency further by setting up load  
balancing for Apache and Postgres, thereby giving even more cores/ 
boxes to each separate service, though this is getting out of my league.

-I

>
>
>  -- Jeremy
>
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>



More information about the ome-devel mailing list