[ome-devel] Problem when importing >1000 images

Mon May 9 20:15:24 BST 2016

On Fri, May 6, 2016 at 5:14 PM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
> Dear Josh,

Hi Andrii,

> The version we are running is 5.1.4-ice35-b55.

Have you considered upgrading recently? 5.2.3 just came out and with
it, the 5.2 series will soon be going into maintenance mode, while
support for 5.1 will be dropped.

With the latest version, I've just attempted importing 700-800 hundred
directories each with 6 images using:

  $ for x in $(seq 1 1000); do /opt/ome0/dist/bin/omero import
$(printf "%04d" "$x") ; done

So far I've had no exception with 5.2.3. If you'd like me to try with
one of your images (assuming they are all similar), feel free to
upload it to http://qa.openmicroscopy.org.uk/

Cheers,
~Josh

> Best regards,
> Andrii
>
>
> On 06/05/2016 16:11, Josh Moore wrote:
>>
>> On Fri, May 6, 2016 at 12:49 PM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
>>>
>>> Dear Josh,
>>
>> Hi Andrii,
>>
>>
>>> I have added a logout after to the script after each import call. This
>>> time
>>> more than 2000 entries have been imported, however an error happened.
>>> Please
>>> could you check the attached log? Is this the same issue with NFS or
>>> something different? Is it possible that using sessions might help?
>>
>> It does look like you're still running in the session/service
>> exhaustion as you were seeing earlier.  If using a single session
>> doesn't solve the problem, the only other thing I can think to try at
>> this point is a forcible closing of services. What version of OMERO
>> are you using?
>>
>> Cheers,
>> ~Josh.
>>
>>
>>
>>
>>> Thank you and best regards,
>>> Andrii
>>>
>>> On 02/05/2016 06:44, Josh Moore wrote:
>>>>
>>>> On Fri, Apr 29, 2016 at 12:41 PM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
>>>>>
>>>>> Dear Josh,
>>>>
>>>> Hi Andrii,
>>>>
>>>>
>>>>> Thank you for providing the possible solution to our problem. We will
>>>>> test
>>>>> the session usage and get back with the results. Please could you
>>>>> clarify
>>>>> a
>>>>> few things about your propositions?
>>>>>
>>>>> Is it possible to add a wait time somewhere in the code to compensate
>>>>> for
>>>>> the slower NFS locking?
>>>>
>>>> Cconceivably, but considering the state the serve could possibly be in
>>>> at that point (shutdown, etc) it's difficult to know. One option is to
>>>> put your /OMERO directory on a non-NFS filesystem and then symlink in
>>>> individual directories from NFS. Ultimately, though, this points to an
>>>> issue with the remote fileshare that needs to be looked into.
>>>>
>>>>
>>>>> As far as I can see we do not call
>>>>> bin/omero login
>>>>
>>>> `bin/omero import` calls `login` if no login is present.
>>>>
>>>>
>>>>> explicitly at this moment. Is it an integral part of the import? There
>>>>> is
>>>>> also BlitzGateway.connect() call before the script goes into the loop
>>>>> over
>>>>> all images.
>>>>
>>>> Agreed. There are a couple of different logins in play here which
>>>> makes it all a bit complicated. One option would be to get everything
>>>> into the same process with no subprocess calls to `bin/omero import`.
>>>>
>>>>
>>>>> Does this mean then that we should call logout after each import?
>>>>
>>>> That's probably the easiest thing to test. Longer-term, it'd be better
>>>> to use a session key.
>>>>
>>>>
>>>>> Thank you and best regards,
>>>>> Andrii
>>>>
>>>> Cheers,
>>>> ~Josh.
>>>>
>>>>
>>>>
>>>>> On 28/04/2016 10:17, Josh Moore wrote:
>>>>>>
>>>>>> On Wed, Apr 27, 2016 at 11:40 AM, Andrii Iudin <andrii at ebi.ac.uk>
>>>>>> wrote:
>>>>>>>
>>>>>>> Dear Josh,
>>>>>>
>>>>>> Hi Andrii,
>>>>>>
>>>>>>
>>>>>>> Thank you for pointing to the documentation on the remote shares.
>>>>>>> Those
>>>>>>> .lock files usually appear if we stop the server after one of the
>>>>>>> "crashes".
>>>>>>> When stopping and starting the server during its normal functioning
>>>>>>> they
>>>>>>> seem to be not created.
>>>>>>
>>>>>> It sounds like a race condition. When the server is under pressure,
>>>>>> etc., then there's no time for the slower NFS locking implementation
>>>>>> to do what it should. This is what makes the remote share not behave
>>>>>> as a posix filesystem should. There has been some success with other
>>>>>> versions of NFS and lockd tuning.
>>>>>>
>>>>>>
>>>>>>> The run_command definition is following:
>>>>>>>        def run_command(self, command, logFile=None):
>>>>>>
>>>>>> Thanks for the definition. I don't see anything off-hand in your code.
>>>>>> If there's a keep alive bug in the import code itself, you might
>>>>>> trying running a separate process with:
>>>>>>
>>>>>>        bin/omero sessions keepalive
>>>>>>
>>>>>> You can either do that in a console for testing, or via your Python
>>>>>> driver itself. If that fixes the problem, then we can help you
>>>>>> integrate that code into your main script without the need for a
>>>>>> subprocess. Additionally, the session UUID that is created by that
>>>>>> method could be used in all of your import subprocesses which would 1)
>>>>>> protect the use of the password and 2) lower the overhead on the
>>>>>> server.
>>>>>>
>>>>>> (In fact, now that I think of it, if you don't have a call to
>>>>>> `bin/omero logout` anywhere in your code, this may be exactly the
>>>>>> problem that you are running into. Each call to `bin/omero login`
>>>>>> creates a new session which is kept alive for the default session
>>>>>> timeout.)
>>>>>>
>>>>>> Cheers,
>>>>>> ~Josh.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Best regards,
>>>>>>> Andrii
>>>>>>>
>>>>>>>
>>>>>>> On 26/04/2016 21:00, Josh Moore wrote:
>>>>>>>>
>>>>>>>> Hi Andrii,
>>>>>>>>
>>>>>>>> On Tue, Apr 26, 2016 at 10:56 AM, Andrii Iudin <andrii at ebi.ac.uk>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Dear Josh,
>>>>>>>>>
>>>>>>>>> Please find attached the import script. For each EMDB entry it
>>>>>>>>> performs
>>>>>>>>> an
>>>>>>>>> import of six images - three sides and their thumbnails.
>>>>>>>>
>>>>>>>> Thanks for this. And where's the definition of `run_command`?
>>>>>>>>
>>>>>>>>
>>>>>>>>> To stop OMERO we use "omero web stop" and then "omero admin stop"
>>>>>>>>> commands.
>>>>>>>>> After this it is necessary to remove
>>>>>>>>> var/OMERO.data/.omero/repository/*/.lock files before starting
>>>>>>>>> OMERO
>>>>>>>>> again.
>>>>>>>>> The system is NFS.
>>>>>>>>
>>>>>>>> I'd assume then that disconnections & the .lock files are unrelated.
>>>>>>>> Please see
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://www.openmicroscopy.org/site/support/omero5.2/sysadmins/unix/server-binary-repository.html#locking-and-remote-shares
>>>>>>>> regarding using remote shares.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> ~Josh.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Andrii
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 25/04/2016 16:21, Josh Moore wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 22, 2016 at 12:41 PM, Andrii Iudin <andrii at ebi.ac.uk>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Dear OMERO developers,
>>>>>>>>>>>
>>>>>>>>>>> We are experiencing an issue when importing a large number of
>>>>>>>>>>> images
>>>>>>>>>>> in
>>>>>>>>>>> a
>>>>>>>>>>> single consequent go. This usually happens after importing more
>>>>>>>>>>> than
>>>>>>>>>>> a
>>>>>>>>>>> thousand images. Please see below excerpts from the logs.
>>>>>>>>>>> Increasing
>>>>>>>>>>> a
>>>>>>>>>>> time
>>>>>>>>>>> period between each import seemed to helped a bit, however this
>>>>>>>>>>> issue
>>>>>>>>>>> ultimately happened anyway.
>>>>>>>>>>
>>>>>>>>>> Is this script available publicly? It would be useful to see how
>>>>>>>>>> it's
>>>>>>>>>> working.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> To get OMERO server working after this happens,
>>>>>>>>>>> it is necessary to stop it, remove .lock files and start the
>>>>>>>>>>> server
>>>>>>>>>>> again.
>>>>>>>>>>> It would be much appreciated if you could point out to a possible
>>>>>>>>>>> way
>>>>>>>>>>> to
>>>>>>>>>>> solve this issue.
>>>>>>>>>>
>>>>>>>>>> How did you stop OMERO? Is your file system on NFS or another
>>>>>>>>>> remote
>>>>>>>>>> share?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thank you and with best regards,
>>>>>>>>>>> Andrii
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> ~Josh.
>>