[ome-devel] Problem when importing >1000 images

Mon May 2 06:44:48 BST 2016

On Fri, Apr 29, 2016 at 12:41 PM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
> Dear Josh,

Hi Andrii,

> Thank you for providing the possible solution to our problem. We will test
> the session usage and get back with the results. Please could you clarify a
> few things about your propositions?
>
> Is it possible to add a wait time somewhere in the code to compensate for
> the slower NFS locking?

Cconceivably, but considering the state the serve could possibly be in
at that point (shutdown, etc) it's difficult to know. One option is to
put your /OMERO directory on a non-NFS filesystem and then symlink in
individual directories from NFS. Ultimately, though, this points to an
issue with the remote fileshare that needs to be looked into.

> As far as I can see we do not call
> bin/omero login

`bin/omero import` calls `login` if no login is present.

> explicitly at this moment. Is it an integral part of the import? There is
> also BlitzGateway.connect() call before the script goes into the loop over
> all images.

Agreed. There are a couple of different logins in play here which
makes it all a bit complicated. One option would be to get everything
into the same process with no subprocess calls to `bin/omero import`.

> Does this mean then that we should call logout after each import?

That's probably the easiest thing to test. Longer-term, it'd be better
to use a session key.

> Thank you and best regards,
> Andrii

Cheers,
~Josh.

> On 28/04/2016 10:17, Josh Moore wrote:
>>
>> On Wed, Apr 27, 2016 at 11:40 AM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
>>>
>>> Dear Josh,
>>
>> Hi Andrii,
>>
>>
>>> Thank you for pointing to the documentation on the remote shares. Those
>>> .lock files usually appear if we stop the server after one of the
>>> "crashes".
>>> When stopping and starting the server during its normal functioning they
>>> seem to be not created.
>>
>> It sounds like a race condition. When the server is under pressure,
>> etc., then there's no time for the slower NFS locking implementation
>> to do what it should. This is what makes the remote share not behave
>> as a posix filesystem should. There has been some success with other
>> versions of NFS and lockd tuning.
>>
>>
>>> The run_command definition is following:
>>>      def run_command(self, command, logFile=None):
>>
>> Thanks for the definition. I don't see anything off-hand in your code.
>> If there's a keep alive bug in the import code itself, you might
>> trying running a separate process with:
>>
>>      bin/omero sessions keepalive
>>
>> You can either do that in a console for testing, or via your Python
>> driver itself. If that fixes the problem, then we can help you
>> integrate that code into your main script without the need for a
>> subprocess. Additionally, the session UUID that is created by that
>> method could be used in all of your import subprocesses which would 1)
>> protect the use of the password and 2) lower the overhead on the
>> server.
>>
>> (In fact, now that I think of it, if you don't have a call to
>> `bin/omero logout` anywhere in your code, this may be exactly the
>> problem that you are running into. Each call to `bin/omero login`
>> creates a new session which is kept alive for the default session
>> timeout.)
>>
>> Cheers,
>> ~Josh.
>>
>>
>>
>>
>>> Best regards,
>>> Andrii
>>>
>>>
>>> On 26/04/2016 21:00, Josh Moore wrote:
>>>>
>>>> Hi Andrii,
>>>>
>>>> On Tue, Apr 26, 2016 at 10:56 AM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
>>>>>
>>>>> Dear Josh,
>>>>>
>>>>> Please find attached the import script. For each EMDB entry it performs
>>>>> an
>>>>> import of six images - three sides and their thumbnails.
>>>>
>>>> Thanks for this. And where's the definition of `run_command`?
>>>>
>>>>
>>>>> To stop OMERO we use "omero web stop" and then "omero admin stop"
>>>>> commands.
>>>>> After this it is necessary to remove
>>>>> var/OMERO.data/.omero/repository/*/.lock files before starting OMERO
>>>>> again.
>>>>> The system is NFS.
>>>>
>>>> I'd assume then that disconnections & the .lock files are unrelated.
>>>> Please see
>>>>
>>>> https://www.openmicroscopy.org/site/support/omero5.2/sysadmins/unix/server-binary-repository.html#locking-and-remote-shares
>>>> regarding using remote shares.
>>>>
>>>> Cheers,
>>>> ~Josh.
>>>>
>>>>
>>>>
>>>>> Best regards,
>>>>> Andrii
>>>>>
>>>>>
>>>>> On 25/04/2016 16:21, Josh Moore wrote:
>>>>>>
>>>>>> On Fri, Apr 22, 2016 at 12:41 PM, Andrii Iudin <andrii at ebi.ac.uk>
>>>>>> wrote:
>>>>>>>
>>>>>>> Dear OMERO developers,
>>>>>>>
>>>>>>> We are experiencing an issue when importing a large number of images
>>>>>>> in
>>>>>>> a
>>>>>>> single consequent go. This usually happens after importing more than
>>>>>>> a
>>>>>>> thousand images. Please see below excerpts from the logs. Increasing
>>>>>>> a
>>>>>>> time
>>>>>>> period between each import seemed to helped a bit, however this issue
>>>>>>> ultimately happened anyway.
>>>>>>
>>>>>> Is this script available publicly? It would be useful to see how it's
>>>>>> working.
>>>>>>
>>>>>>
>>>>>>> To get OMERO server working after this happens,
>>>>>>> it is necessary to stop it, remove .lock files and start the server
>>>>>>> again.
>>>>>>> It would be much appreciated if you could point out to a possible way
>>>>>>> to
>>>>>>> solve this issue.
>>>>>>
>>>>>> How did you stop OMERO? Is your file system on NFS or another remote
>>>>>> share?
>>>>>>
>>>>>>
>>>>>>> Thank you and with best regards,
>>>>>>> Andrii
>>>>>>
>>>>>> Cheers,
>>>>>> ~Josh.
>>
>> _______________________________________________
>> ome-devel mailing list
>> ome-devel at lists.openmicroscopy.org.uk
>> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
>
> _______________________________________________
> ome-devel mailing list
> ome-devel at lists.openmicroscopy.org.uk
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel