[ome-devel] cluster support

Chris Allan callan at blackcat.ca
Mon Dec 4 09:58:52 GMT 2006


Hi guys,

Just going to plop my 2 cents in here as we're doing quite a bit of  
this sort of cluster integration (albeit not directly with OME  
recently) in Dundee at the moment. If I understand things correctly,  
your run time is going to be in minutes (potentially hours?) per job.  
You haven't mentioned much about the scheduling algorithms that your  
LSF install uses though I think that's probably irrelevant for the  
purposes of this discussion. Further to that, how big is the actual  
cluster and how homogenous is it? Please correct me if I've  
misconstrued anything.

All that context understood, I think it's pretty safe to say that any  
cluster administrator who's halfway sane wouldn't allow you to run  
Apache instances on his/her nodes for a multitude of reasons, not the  
least of which being that you could use it to circumvent the  
scheduler quite easily and that's an administrative nightmare. The  
cleanup phase of job scheduling on LSF may even kill all processes  
spawned during a given job execution that are still in the running  
state upon completion, I'm not sure. That said, if your job run time  
is in fact in the minute time scale it's not going to matter if it  
took seconds to begin execution anyway.

Much of the greater execution state is stored in the OME database, so  
your best bet is to probably write an OME analysis handler whose sole  
responsibility it is to submit a job to the LSF scheduler. The  
submitee should probably be a reasonably intelligent script who can,  
from the MEX and/or other metadata, run the actual job that the  
analysis chain has asked it to. You've got someone else maintaining  
the scheduler for you after all, there's really no need to burden  
yourself with maintaining that sort of infrastructure within OME.

Installing OME's Perl library and dependencies on every single node  
could be a bit of a pig (especially if you have a heterogeneous  
cluster) so you might decide to use the XML-RPC based remote  
framework to communicate with the server depending on the data  
volumes you've got to handle. I guess I'd have to understand exactly  
what sorts of analysis chains you'd want to schedule to run on the  
cluster. All of them? Just some of them?

Given that you want to integrate with an existing cluster whose  
operating system and deployment you may or may not control and an  
existing scheduler I'd suggest that Ilya's group's cluster  
integration tools probably don't give you exactly what you want but  
would give you some sort of idea as to how to write your own and that  
"shouldn't be too hard" (tm).

Ciao.

-Chris

On 2 Dec 2006, at 00:10, Jeremy Muhlich wrote:

> The Harvard Medical School cluster uses LSF.
>
> http://www.platform.com/Products/Platform.LSF.Family/
>
> All nodes can make network connections to all other nodes, and they  
> all
> mount a massive shared filesystem in addition to maybe 60GB of local
> scratch space.  The interconnect is gigabit ethernet.
>
>
>
> On Fri, 2006-12-01 at 17:17 -0500, Ilya Goldberg wrote:
>> What's the cluster management software being used?
>> As there is growing interest in cluster computing for this, it would
>> definitely be worth-while to commit to some scheme that everyone
>> could be happy with.  I think it would be trivial to do this with a
>> PBS-type manager which essentially runs command-line programs -
>> assuming you're willing to take the startup hit.  In our case, this
>> was approaching 50% of the total execution time, so seemed very
>> wasteful.  I don't know much about Grid Engine, but people who do
>> tell me that it is possible to maintain state using this system.
>> Apache is nothing but a container to persist a MATLAB instance.  It
>> doesn't really matter how that gets done as long as it gets done
>> somehow.
>>
>> Can each node make arbitrary TCP/IP connections to the master?  To an
>> arbitrary IP address? At the very least it would need to make client-
>> style http and Posgres connections, and possibly outside of the
>> cluster unless the database and image servers are running on the
>> master node (most likely not).  Some cluster managers insist on doing
>> all communication with files only.  That would be a pretty
>> significant burden.
>>
>> My knowledge of Grid Engine can probably be summarized on the back of
>> a postage stamp with a felt-tip marker.  It seems to me to have the
>> right bits to do what we want, and it certainly has the shiny Sun
>> marketing juggernaut behind it, so presumably one would be able to
>> talk a cluster manager into supporting it - no?
>> -Ilya
>>
>> On Dec 1, 2006, at 12:14 PM, Jeremy Muhlich wrote:
>>
>>> On Thu, 2006-11-23 at 14:22 -0500, Ilya Goldberg wrote:
>>>> So the way the OME cluster is set up is that every node is running
>>>> Apache.  The master node issues requests that include remote DB
>>>> connection info and job info.  The worker node establishes a DB
>>>> connection, returns an OK message (to unblock the master), then
>>>> continues processing the request.  When its done, its supposed to
>>>> issue an IPC message using the DB driver, but this bit hasn't been
>>>> working well recently.  Anyway, the master doesn't wait around
>>>> forever for the IPC "finished" message, so things continue cranking
>>>> along fairly well.  The only effect seems to be that the master  
>>>> gets
>>>> loaded a little more than it should be.
>>>
>>> Hmmm.  This is a shared cluster with time-limited job queues.  For
>>> example the 15m queue has the highest priority but will kill your  
>>> job
>>> after 15 minutes.  The complete list of queues in priority order is
>>> 15m,
>>> 2h, 12h, 1d, 7d, and unlimited.  It could be difficult to employ  
>>> your
>>> apache-everywhere scheme on this sort of system.  However, a  
>>> group who
>>> contributes a node gets top priority on it, so that might be the
>>> way to
>>> go.
>>>
>>>>>
>>>>> Also, is the image server more cpu bound or I/O bound?
>>>>
>>>> Definitely IO bound.  It could start hitting the CPU if you request
>>>> lots and lots of rendered planes rather than raw data for analysis,
>>>> but its probably IO bound even then.
>>>
>>> Thanks, that's helpful to know.
>>>
>>>
>>>  -- Jeremy
>>>
>>> _______________________________________________
>>> ome-devel mailing list
>>> ome-devel at lists.openmicroscopy.org.uk
>>> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>>
>>
>> _______________________________________________
>> ome-devel mailing list
>> ome-devel at lists.openmicroscopy.org.uk
>> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>>
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel



More information about the ome-devel mailing list