[ome-devel] cluster support
Ilya Goldberg
igg at nih.gov
Fri Dec 1 22:17:12 GMT 2006
What's the cluster management software being used?
As there is growing interest in cluster computing for this, it would
definitely be worth-while to commit to some scheme that everyone
could be happy with. I think it would be trivial to do this with a
PBS-type manager which essentially runs command-line programs -
assuming you're willing to take the startup hit. In our case, this
was approaching 50% of the total execution time, so seemed very
wasteful. I don't know much about Grid Engine, but people who do
tell me that it is possible to maintain state using this system.
Apache is nothing but a container to persist a MATLAB instance. It
doesn't really matter how that gets done as long as it gets done
somehow.
Can each node make arbitrary TCP/IP connections to the master? To an
arbitrary IP address? At the very least it would need to make client-
style http and Posgres connections, and possibly outside of the
cluster unless the database and image servers are running on the
master node (most likely not). Some cluster managers insist on doing
all communication with files only. That would be a pretty
significant burden.
My knowledge of Grid Engine can probably be summarized on the back of
a postage stamp with a felt-tip marker. It seems to me to have the
right bits to do what we want, and it certainly has the shiny Sun
marketing juggernaut behind it, so presumably one would be able to
talk a cluster manager into supporting it - no?
-Ilya
On Dec 1, 2006, at 12:14 PM, Jeremy Muhlich wrote:
> On Thu, 2006-11-23 at 14:22 -0500, Ilya Goldberg wrote:
>> So the way the OME cluster is set up is that every node is running
>> Apache. The master node issues requests that include remote DB
>> connection info and job info. The worker node establishes a DB
>> connection, returns an OK message (to unblock the master), then
>> continues processing the request. When its done, its supposed to
>> issue an IPC message using the DB driver, but this bit hasn't been
>> working well recently. Anyway, the master doesn't wait around
>> forever for the IPC "finished" message, so things continue cranking
>> along fairly well. The only effect seems to be that the master gets
>> loaded a little more than it should be.
>
> Hmmm. This is a shared cluster with time-limited job queues. For
> example the 15m queue has the highest priority but will kill your job
> after 15 minutes. The complete list of queues in priority order is
> 15m,
> 2h, 12h, 1d, 7d, and unlimited. It could be difficult to employ your
> apache-everywhere scheme on this sort of system. However, a group who
> contributes a node gets top priority on it, so that might be the
> way to
> go.
>
>>>
>>> Also, is the image server more cpu bound or I/O bound?
>>
>> Definitely IO bound. It could start hitting the CPU if you request
>> lots and lots of rendered planes rather than raw data for analysis,
>> but its probably IO bound even then.
>
> Thanks, that's helpful to know.
>
>
> -- Jeremy
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
More information about the ome-devel
mailing list