[ome-devel] cluster support

Fri Dec 1 22:17:12 GMT 2006

What's the cluster management software being used?
As there is growing interest in cluster computing for this, it would  
definitely be worth-while to commit to some scheme that everyone  
could be happy with.  I think it would be trivial to do this with a  
PBS-type manager which essentially runs command-line programs -  
assuming you're willing to take the startup hit.  In our case, this  
was approaching 50% of the total execution time, so seemed very  
wasteful.  I don't know much about Grid Engine, but people who do  
tell me that it is possible to maintain state using this system.   
Apache is nothing but a container to persist a MATLAB instance.  It  
doesn't really matter how that gets done as long as it gets done  
somehow.

Can each node make arbitrary TCP/IP connections to the master?  To an  
arbitrary IP address? At the very least it would need to make client- 
style http and Posgres connections, and possibly outside of the  
cluster unless the database and image servers are running on the  
master node (most likely not).  Some cluster managers insist on doing  
all communication with files only.  That would be a pretty  
significant burden.

My knowledge of Grid Engine can probably be summarized on the back of  
a postage stamp with a felt-tip marker.  It seems to me to have the  
right bits to do what we want, and it certainly has the shiny Sun  
marketing juggernaut behind it, so presumably one would be able to  
talk a cluster manager into supporting it - no?
-Ilya

On Dec 1, 2006, at 12:14 PM, Jeremy Muhlich wrote:

> On Thu, 2006-11-23 at 14:22 -0500, Ilya Goldberg wrote:
>> So the way the OME cluster is set up is that every node is running
>> Apache.  The master node issues requests that include remote DB
>> connection info and job info.  The worker node establishes a DB
>> connection, returns an OK message (to unblock the master), then
>> continues processing the request.  When its done, its supposed to
>> issue an IPC message using the DB driver, but this bit hasn't been
>> working well recently.  Anyway, the master doesn't wait around
>> forever for the IPC "finished" message, so things continue cranking
>> along fairly well.  The only effect seems to be that the master gets
>> loaded a little more than it should be.
>
> Hmmm.  This is a shared cluster with time-limited job queues.  For
> example the 15m queue has the highest priority but will kill your job
> after 15 minutes.  The complete list of queues in priority order is  
> 15m,
> 2h, 12h, 1d, 7d, and unlimited.  It could be difficult to employ your
> apache-everywhere scheme on this sort of system.  However, a group who
> contributes a node gets top priority on it, so that might be the  
> way to
> go.
>
>>>
>>> Also, is the image server more cpu bound or I/O bound?
>>
>> Definitely IO bound.  It could start hitting the CPU if you request
>> lots and lots of rendered planes rather than raw data for analysis,
>> but its probably IO bound even then.
>
> Thanks, that's helpful to know.
>
>
>  -- Jeremy
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>