[ome-devel] Problem when importing >1000 images

Tue May 10 07:40:32 BST 2016

On Mon, May 9, 2016 at 11:34 PM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
> Dear Josh,

Hi Andrii,

> Yes, we plan to do the update to the latest version, we just have been
> waiting for the release of OMERO that will have Bio-Formats 5.2.0 since it
> will introduce fixes that are required for us to use OMERO for the EMPIAR
> images

Sorry if I've forgotten: which fixes are those?

> and since we expect this to be a major update for our systems. Please
> correct me if I am wrong, but Bio-Formats version that is in the latest
> OMERO is below 5.2.0?

Correct. OMERO 5.2.3 ships with Bio-Formats 5.1.9. A version of OMERO
with Bio-Formats 5.2.0 (which is still unreleased) won't be available
until much later this year at the earliest.

> Best regards,
> Andrii

Cheers,
~Josh

> On 09/05/2016 20:15, Josh Moore wrote:
>>
>> On Fri, May 6, 2016 at 5:14 PM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
>>>
>>> Dear Josh,
>>
>> Hi Andrii,
>>
>>> The version we are running is 5.1.4-ice35-b55.
>>
>> Have you considered upgrading recently? 5.2.3 just came out and with
>> it, the 5.2 series will soon be going into maintenance mode, while
>> support for 5.1 will be dropped.
>>
>> With the latest version, I've just attempted importing 700-800 hundred
>> directories each with 6 images using:
>>
>>    $ for x in $(seq 1 1000); do /opt/ome0/dist/bin/omero import
>> $(printf "%04d" "$x") ; done
>>
>> So far I've had no exception with 5.2.3. If you'd like me to try with
>> one of your images (assuming they are all similar), feel free to
>> upload it to http://qa.openmicroscopy.org.uk/
>>
>> Cheers,
>> ~Josh
>>
>>
>>
>>
>>> Best regards,
>>> Andrii
>>>
>>>
>>> On 06/05/2016 16:11, Josh Moore wrote:
>>>>
>>>> On Fri, May 6, 2016 at 12:49 PM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
>>>>>
>>>>> Dear Josh,
>>>>
>>>> Hi Andrii,
>>>>
>>>>
>>>>> I have added a logout after to the script after each import call. This
>>>>> time
>>>>> more than 2000 entries have been imported, however an error happened.
>>>>> Please
>>>>> could you check the attached log? Is this the same issue with NFS or
>>>>> something different? Is it possible that using sessions might help?
>>>>
>>>> It does look like you're still running in the session/service
>>>> exhaustion as you were seeing earlier.  If using a single session
>>>> doesn't solve the problem, the only other thing I can think to try at
>>>> this point is a forcible closing of services. What version of OMERO
>>>> are you using?
>>>>
>>>> Cheers,
>>>> ~Josh.
>>>>
>>>>
>>>>
>>>>
>>>>> Thank you and best regards,
>>>>> Andrii
>>>>>
>>>>> On 02/05/2016 06:44, Josh Moore wrote:
>>>>>>
>>>>>> On Fri, Apr 29, 2016 at 12:41 PM, Andrii Iudin <andrii at ebi.ac.uk>
>>>>>> wrote:
>>>>>>>
>>>>>>> Dear Josh,
>>>>>>
>>>>>> Hi Andrii,
>>>>>>
>>>>>>
>>>>>>> Thank you for providing the possible solution to our problem. We will
>>>>>>> test
>>>>>>> the session usage and get back with the results. Please could you
>>>>>>> clarify
>>>>>>> a
>>>>>>> few things about your propositions?
>>>>>>>
>>>>>>> Is it possible to add a wait time somewhere in the code to compensate
>>>>>>> for
>>>>>>> the slower NFS locking?
>>>>>>
>>>>>> Cconceivably, but considering the state the serve could possibly be in
>>>>>> at that point (shutdown, etc) it's difficult to know. One option is to
>>>>>> put your /OMERO directory on a non-NFS filesystem and then symlink in
>>>>>> individual directories from NFS. Ultimately, though, this points to an
>>>>>> issue with the remote fileshare that needs to be looked into.
>>>>>>
>>>>>>
>>>>>>> As far as I can see we do not call
>>>>>>> bin/omero login
>>>>>>
>>>>>> `bin/omero import` calls `login` if no login is present.
>>>>>>
>>>>>>
>>>>>>> explicitly at this moment. Is it an integral part of the import?
>>>>>>> There
>>>>>>> is
>>>>>>> also BlitzGateway.connect() call before the script goes into the loop
>>>>>>> over
>>>>>>> all images.
>>>>>>
>>>>>> Agreed. There are a couple of different logins in play here which
>>>>>> makes it all a bit complicated. One option would be to get everything
>>>>>> into the same process with no subprocess calls to `bin/omero import`.
>>>>>>
>>>>>>
>>>>>>> Does this mean then that we should call logout after each import?
>>>>>>
>>>>>> That's probably the easiest thing to test. Longer-term, it'd be better
>>>>>> to use a session key.
>>>>>>
>>>>>>
>>>>>>> Thank you and best regards,
>>>>>>> Andrii
>>>>>>
>>>>>> Cheers,
>>>>>> ~Josh.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On 28/04/2016 10:17, Josh Moore wrote:
>>>>>>>>
>>>>>>>> On Wed, Apr 27, 2016 at 11:40 AM, Andrii Iudin <andrii at ebi.ac.uk>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Dear Josh,
>>>>>>>>
>>>>>>>> Hi Andrii,
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thank you for pointing to the documentation on the remote shares.
>>>>>>>>> Those
>>>>>>>>> .lock files usually appear if we stop the server after one of the
>>>>>>>>> "crashes".
>>>>>>>>> When stopping and starting the server during its normal functioning
>>>>>>>>> they
>>>>>>>>> seem to be not created.
>>>>>>>>
>>>>>>>> It sounds like a race condition. When the server is under pressure,
>>>>>>>> etc., then there's no time for the slower NFS locking implementation
>>>>>>>> to do what it should. This is what makes the remote share not behave
>>>>>>>> as a posix filesystem should. There has been some success with other
>>>>>>>> versions of NFS and lockd tuning.
>>>>>>>>
>>>>>>>>
>>>>>>>>> The run_command definition is following:
>>>>>>>>>         def run_command(self, command, logFile=None):
>>>>>>>>
>>>>>>>> Thanks for the definition. I don't see anything off-hand in your
>>>>>>>> code.
>>>>>>>> If there's a keep alive bug in the import code itself, you might
>>>>>>>> trying running a separate process with:
>>>>>>>>
>>>>>>>>         bin/omero sessions keepalive
>>>>>>>>
>>>>>>>> You can either do that in a console for testing, or via your Python
>>>>>>>> driver itself. If that fixes the problem, then we can help you
>>>>>>>> integrate that code into your main script without the need for a
>>>>>>>> subprocess. Additionally, the session UUID that is created by that
>>>>>>>> method could be used in all of your import subprocesses which would
>>>>>>>> 1)
>>>>>>>> protect the use of the password and 2) lower the overhead on the
>>>>>>>> server.
>>>>>>>>
>>>>>>>> (In fact, now that I think of it, if you don't have a call to
>>>>>>>> `bin/omero logout` anywhere in your code, this may be exactly the
>>>>>>>> problem that you are running into. Each call to `bin/omero login`
>>>>>>>> creates a new session which is kept alive for the default session
>>>>>>>> timeout.)
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> ~Josh.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Andrii
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 26/04/2016 21:00, Josh Moore wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Andrii,
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 26, 2016 at 10:56 AM, Andrii Iudin <andrii at ebi.ac.uk>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Dear Josh,
>>>>>>>>>>>
>>>>>>>>>>> Please find attached the import script. For each EMDB entry it
>>>>>>>>>>> performs
>>>>>>>>>>> an
>>>>>>>>>>> import of six images - three sides and their thumbnails.
>>>>>>>>>>
>>>>>>>>>> Thanks for this. And where's the definition of `run_command`?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> To stop OMERO we use "omero web stop" and then "omero admin stop"
>>>>>>>>>>> commands.
>>>>>>>>>>> After this it is necessary to remove
>>>>>>>>>>> var/OMERO.data/.omero/repository/*/.lock files before starting
>>>>>>>>>>> OMERO
>>>>>>>>>>> again.
>>>>>>>>>>> The system is NFS.
>>>>>>>>>>
>>>>>>>>>> I'd assume then that disconnections & the .lock files are
>>>>>>>>>> unrelated.
>>>>>>>>>> Please see
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://www.openmicroscopy.org/site/support/omero5.2/sysadmins/unix/server-binary-repository.html#locking-and-remote-shares
>>>>>>>>>> regarding using remote shares.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> ~Josh.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Andrii
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 25/04/2016 16:21, Josh Moore wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Apr 22, 2016 at 12:41 PM, Andrii Iudin
>>>>>>>>>>>> <andrii at ebi.ac.uk>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dear OMERO developers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> We are experiencing an issue when importing a large number of
>>>>>>>>>>>>> images
>>>>>>>>>>>>> in
>>>>>>>>>>>>> a
>>>>>>>>>>>>> single consequent go. This usually happens after importing more
>>>>>>>>>>>>> than
>>>>>>>>>>>>> a
>>>>>>>>>>>>> thousand images. Please see below excerpts from the logs.
>>>>>>>>>>>>> Increasing
>>>>>>>>>>>>> a
>>>>>>>>>>>>> time
>>>>>>>>>>>>> period between each import seemed to helped a bit, however this
>>>>>>>>>>>>> issue
>>>>>>>>>>>>> ultimately happened anyway.
>>>>>>>>>>>>
>>>>>>>>>>>> Is this script available publicly? It would be useful to see how
>>>>>>>>>>>> it's
>>>>>>>>>>>> working.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> To get OMERO server working after this happens,
>>>>>>>>>>>>> it is necessary to stop it, remove .lock files and start the
>>>>>>>>>>>>> server
>>>>>>>>>>>>> again.
>>>>>>>>>>>>> It would be much appreciated if you could point out to a
>>>>>>>>>>>>> possible
>>>>>>>>>>>>> way
>>>>>>>>>>>>> to
>>>>>>>>>>>>> solve this issue.
>>>>>>>>>>>>
>>>>>>>>>>>> How did you stop OMERO? Is your file system on NFS or another
>>>>>>>>>>>> remote
>>>>>>>>>>>> share?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you and with best regards,
>>>>>>>>>>>>> Andrii
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> ~Josh.