<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7650.28">
<TITLE>RE: [ome-devel] Install failure loading Experiment.ome</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>It's hard to tell how many vendors have this issue; a quick Google shows some compliant, some partially, some not at all. It seems to be improving, however - OS X, for example, was not POSIX-compliant but apparently is now. The vendor in question has a proprietary OS, and their tech support can't even tell me if their NFS implimentation is POSIX-compliant or not. They're using some sort of modified version of apache strictly for management; there's no way I'd run OME on it even if they'd let me.<BR>
<BR>
As NFS doesn't seem to be a problem for most OME users, there's no reason for you to worry about it.<BR>
<BR>
A quick run of test-concurrent-write returns:<BR>
In PID 17617, Error calling test-concurrent-write: NewPixels failed.<BR>
System Error: No such file or directory<BR>
OMEIS Error: Couldn't get next Pixels ID: No such file or directory<BR>
<BR>
Is that indicative of a file system problem, or something else?<BR>
<BR>
Do CIFS shares show this same problem? Y'all are running large dbs, is it all on local filesystems?<BR>
<BR>
Mike<BR>
<BR>
Michael J. McCaughey, PhD<BR>
Molecular Physiology and Biophysics<BR>
U9203 MRBIII<BR>
6-6175<BR>
<BR>
<BR>
<BR>
-----Original Message-----<BR>
From: Ilya Goldberg [<A HREF="mailto:igg@nih.gov">mailto:igg@nih.gov</A>]<BR>
Sent: Fri 10/6/2006 12:31 PM<BR>
To: McCaughey, Michael J<BR>
Subject: Re: [ome-devel] Install failure loading Experiment.ome<BR>
<BR>
I realize that not being able to mount the OMEIS repository as a <BR>
share is a potentially huge problem. I was more wondering about how <BR>
common it is for NFS servers not to correctly implement POSIX-<BR>
compliant file locking (or fcntl() locking as its sometimes known). <BR>
I know that there are NFS servers that are fully posix-compliant with <BR>
respect to file locking. Sometimes this is buried in the NFS server/<BR>
client itself (the way its done in OS X, I believe), other times it <BR>
is done using separate daemons (statd and lockd). I think the best <BR>
place to have this resolved is by talking to the vendors of your NFS <BR>
server and client OSes.<BR>
<BR>
My bet is that the segmentation fault occurs within the Berkley DB, <BR>
because it tends to do that unless file locking is done exactly <BR>
right. It even says so in their documentation. A little while ago <BR>
we discovered an occasional crash there (like 2 or 3 times out of <BR>
thousands) that was traced back to a race condition within the <BR>
Berkley DB code. It could only be replicated with 16 or more <BR>
concurrent request loops going to a multi-CPU server. You might try <BR>
running src/C/omeis/test-concurrent-write. It requires an OMEIS test <BR>
directory called "OMEIS-TEST" in the "current working directory". It <BR>
forks 64 processes and has each of them issue 1,000 OMEIS writes by <BR>
directly calling the OMEIS routines (i.e., not via the normal omeis <BR>
http interface). If locking is the problem, this is generally <BR>
sufficient to expose it.<BR>
<BR>
This was worked around by doing our own locking before handing things <BR>
off to Berkeley DB. Of course, if our locking isn't supported in <BR>
NFS, then Berkeley DB will go back to dumping core if its not happy <BR>
with things. Berkley DB is used in a great many unix apps, so having <BR>
dodgy file locking on shares will likely have some pretty widely felt <BR>
effects. All we ever wanted out of Berkley DB was a balanced B-tree <BR>
algorithm that works efficiently with lots and lots of entries in a <BR>
file shared by multiple concurrent processes. Can't get nothin for <BR>
free, I tell ya.<BR>
<BR>
It would be a pity if this feature is broken in many NFS server <BR>
implementations. It would have a very much larger effect on shared <BR>
file resources other than just OME. OMEIS isn't the only application <BR>
out there that depends on file-locking. Its a big problem for mail <BR>
spool files, for example (that's actually exactly the same problem as <BR>
OMEIS has - granular shared reading and exclusive writing of specific <BR>
potions of potentially very big files).<BR>
<BR>
If posix file locking is abandon-ware as far as most NFS servers are <BR>
concerned, then we will need to implement some kind of work-around. <BR>
It would not be very difficult, though it would make OMEIS a lot less <BR>
efficient. Probably the most straight forward way to do it is to <BR>
establish exclusivity using a sentinel file. I would like to <BR>
preserve fcntl locking if its available though (it is part of the <BR>
posix standard, after all), and ideally not do anything about it if <BR>
vendors generally have solutions to comply with standards.<BR>
<BR>
Is it possible to run just OMEIS directly on the big share?<BR>
<BR>
-Ilya<BR>
<BR>
<BR>
On Oct 6, 2006, at 12:10 PM, McCaughey, Michael J wrote:<BR>
<BR>
> Well, it has the potential to be a big problem here.<BR>
><BR>
> Without the ability to put OMEIS on a share, my OME useage is <BR>
> limited by my local disk capacity. I have a backlog of >10TB of <BR>
> images to import, but the server has nowhere near enough capacity, <BR>
> so I've been trying to use an NFS share from a NAS box that I have <BR>
> 64TB on.<BR>
><BR>
> The IT people have not been able to make CIFS work with *nix <BR>
> clients on the NAS box (don't even start me on that), so I haven't <BR>
> been able to try it. Is anyone putting OMEIS out on a cifs share?<BR>
><BR>
> Mike<BR>
><BR>
> Michael J. McCaughey, PhD<BR>
> Molecular Physiology and Biophysics<BR>
> U9203 MRBIII<BR>
> 6-6175<BR>
><BR>
><BR>
><BR>
> -----Original Message-----<BR>
> From: Ilya Goldberg [<A HREF="mailto:igg@nih.gov">mailto:igg@nih.gov</A>]<BR>
> Sent: Fri 10/6/2006 8:19 AM<BR>
> To: McCaughey, Michael J<BR>
> Subject: Re: [ome-devel] Install failure loading Experiment.ome<BR>
><BR>
> Aha!<BR>
> Its the dread nfs share that I bet doesn't support fully posix-<BR>
> compliant file-locking. There's discussion of this on-line, along<BR>
> with some suggested fixes (its not OME-specific). I don't know how<BR>
> big a problem this is generally speaking. Anyone?<BR>
><BR>
> -Ilya<BR>
><BR>
><BR>
> On Oct 2, 2006, at 4:45 PM, McCaughey, Michael J wrote:<BR>
><BR>
>> The problem seems to be related to installing OME/OMEIS on a nfs-<BR>
>> mounted share.<BR>
>> I can install perfectly well locally, but installs with only the<BR>
>> Base OME directory and Base OMEIS directory changed to a directory<BR>
>> on the share fail as described. The OME/OMEIS directories are<BR>
>> created with correct permissions by install.pl, and are populated.<BR>
>> The share itself is owned by root with world rwx (777) permissions,<BR>
>> so it doesn't *appear* to be a permission issue.<BR>
>><BR>
>> Any suggestions anyone?<BR>
>><BR>
>> Mike<BR>
>><BR>
>> Michael J. McCaughey, PhD<BR>
>> Molecular Physiology and Biophysics<BR>
>> U9203 MRBIII<BR>
>> 6-6175<BR>
>><BR>
>><BR>
>><BR>
>> -----Original Message-----<BR>
>> From: ome-devel-bounces@lists.openmicroscopy.org.uk on behalf of<BR>
>> Chris Allan<BR>
>> Sent: Sun 9/17/2006 4:57 PM<BR>
>> To: ome-devel@lists.openmicroscopy.org.uk<BR>
>> Subject: Re: [ome-devel] Install failure loading Experiment.ome<BR>
>><BR>
>><BR>
>> On 15 Sep 2006, at 15:51, McCaughey, Michael J wrote:<BR>
>><BR>
>> ...snip...<BR>
>>><BR>
>>><BR>
>>> Apache's error log gives:<BR>
>>> [Fri Sep 15 08:25:21 2006] [error] [client 127.0.0.1] In PID 28882,<BR>
>>> Error callin<BR>
>>> g OMEIS: Method parameter missing<BR>
>>> [Fri Sep 15 08:27:02 2006] [error] [client 127.0.0.1] In PID 30644,<BR>
>>> Error calling OMEIS:<BR>
>>> [Fri Sep 15 08:27:02 2006] [error] [client 127.0.0.1] Method<BR>
>>> parameter missing<BR>
>>> [Fri Sep 15 08:27:02 2006] [error] [client 127.0.0.1]<BR>
>>> [Fri Sep 15 08:27:02 2006] [error] [client 127.0.0.1] Premature end<BR>
>>> of script headers: omeis<BR>
>>><BR>
>>> Anybody seen this before?<BR>
>> Ugh, yes. That's likely an OMEIS segfault.<BR>
>><BR>
>> No idea what might be causing it unfortunately and getting cores out<BR>
>> of Apache can be a bit tricky.<BR>
>><BR>
>> Ciao.<BR>
>><BR>
>> -Chris<BR>
>> _______________________________________________<BR>
>> ome-devel mailing list<BR>
>> ome-devel@lists.openmicroscopy.org.uk<BR>
>> <A HREF="http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel">http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel</A><BR>
>><BR>
>><BR>
>> Michael J. McCaughey, PhD<BR>
>> Molecular Physiology and Biophysics<BR>
>> U9203 MRBIII<BR>
>> 6-6175<BR>
>><BR>
>> _______________________________________________<BR>
>> ome-devel mailing list<BR>
>> ome-devel@lists.openmicroscopy.org.uk<BR>
>> <A HREF="http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel">http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel</A><BR>
><BR>
><BR>
<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>