[ome-devel] Problem when importing >1000 images

Josh Moore josh at glencoesoftware.com
Fri May 6 16:11:04 BST 2016


On Fri, May 6, 2016 at 12:49 PM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
> Dear Josh,

Hi Andrii,


> I have added a logout after to the script after each import call. This time
> more than 2000 entries have been imported, however an error happened. Please
> could you check the attached log? Is this the same issue with NFS or
> something different? Is it possible that using sessions might help?

It does look like you're still running in the session/service
exhaustion as you were seeing earlier.  If using a single session
doesn't solve the problem, the only other thing I can think to try at
this point is a forcible closing of services. What version of OMERO
are you using?

Cheers,
~Josh.




> Thank you and best regards,
> Andrii
>
> On 02/05/2016 06:44, Josh Moore wrote:
>>
>> On Fri, Apr 29, 2016 at 12:41 PM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
>>>
>>> Dear Josh,
>>
>> Hi Andrii,
>>
>>
>>> Thank you for providing the possible solution to our problem. We will
>>> test
>>> the session usage and get back with the results. Please could you clarify
>>> a
>>> few things about your propositions?
>>>
>>> Is it possible to add a wait time somewhere in the code to compensate for
>>> the slower NFS locking?
>>
>> Cconceivably, but considering the state the serve could possibly be in
>> at that point (shutdown, etc) it's difficult to know. One option is to
>> put your /OMERO directory on a non-NFS filesystem and then symlink in
>> individual directories from NFS. Ultimately, though, this points to an
>> issue with the remote fileshare that needs to be looked into.
>>
>>
>>> As far as I can see we do not call
>>> bin/omero login
>>
>> `bin/omero import` calls `login` if no login is present.
>>
>>
>>> explicitly at this moment. Is it an integral part of the import? There is
>>> also BlitzGateway.connect() call before the script goes into the loop
>>> over
>>> all images.
>>
>> Agreed. There are a couple of different logins in play here which
>> makes it all a bit complicated. One option would be to get everything
>> into the same process with no subprocess calls to `bin/omero import`.
>>
>>
>>> Does this mean then that we should call logout after each import?
>>
>> That's probably the easiest thing to test. Longer-term, it'd be better
>> to use a session key.
>>
>>
>>> Thank you and best regards,
>>> Andrii
>>
>> Cheers,
>> ~Josh.
>>
>>
>>
>>> On 28/04/2016 10:17, Josh Moore wrote:
>>>>
>>>> On Wed, Apr 27, 2016 at 11:40 AM, Andrii Iudin <andrii at ebi.ac.uk> wrote:
>>>>>
>>>>> Dear Josh,
>>>>
>>>> Hi Andrii,
>>>>
>>>>
>>>>> Thank you for pointing to the documentation on the remote shares. Those
>>>>> .lock files usually appear if we stop the server after one of the
>>>>> "crashes".
>>>>> When stopping and starting the server during its normal functioning
>>>>> they
>>>>> seem to be not created.
>>>>
>>>> It sounds like a race condition. When the server is under pressure,
>>>> etc., then there's no time for the slower NFS locking implementation
>>>> to do what it should. This is what makes the remote share not behave
>>>> as a posix filesystem should. There has been some success with other
>>>> versions of NFS and lockd tuning.
>>>>
>>>>
>>>>> The run_command definition is following:
>>>>>       def run_command(self, command, logFile=None):
>>>>
>>>> Thanks for the definition. I don't see anything off-hand in your code.
>>>> If there's a keep alive bug in the import code itself, you might
>>>> trying running a separate process with:
>>>>
>>>>       bin/omero sessions keepalive
>>>>
>>>> You can either do that in a console for testing, or via your Python
>>>> driver itself. If that fixes the problem, then we can help you
>>>> integrate that code into your main script without the need for a
>>>> subprocess. Additionally, the session UUID that is created by that
>>>> method could be used in all of your import subprocesses which would 1)
>>>> protect the use of the password and 2) lower the overhead on the
>>>> server.
>>>>
>>>> (In fact, now that I think of it, if you don't have a call to
>>>> `bin/omero logout` anywhere in your code, this may be exactly the
>>>> problem that you are running into. Each call to `bin/omero login`
>>>> creates a new session which is kept alive for the default session
>>>> timeout.)
>>>>
>>>> Cheers,
>>>> ~Josh.
>>>>
>>>>
>>>>
>>>>
>>>>> Best regards,
>>>>> Andrii
>>>>>
>>>>>
>>>>> On 26/04/2016 21:00, Josh Moore wrote:
>>>>>>
>>>>>> Hi Andrii,
>>>>>>
>>>>>> On Tue, Apr 26, 2016 at 10:56 AM, Andrii Iudin <andrii at ebi.ac.uk>
>>>>>> wrote:
>>>>>>>
>>>>>>> Dear Josh,
>>>>>>>
>>>>>>> Please find attached the import script. For each EMDB entry it
>>>>>>> performs
>>>>>>> an
>>>>>>> import of six images - three sides and their thumbnails.
>>>>>>
>>>>>> Thanks for this. And where's the definition of `run_command`?
>>>>>>
>>>>>>
>>>>>>> To stop OMERO we use "omero web stop" and then "omero admin stop"
>>>>>>> commands.
>>>>>>> After this it is necessary to remove
>>>>>>> var/OMERO.data/.omero/repository/*/.lock files before starting OMERO
>>>>>>> again.
>>>>>>> The system is NFS.
>>>>>>
>>>>>> I'd assume then that disconnections & the .lock files are unrelated.
>>>>>> Please see
>>>>>>
>>>>>>
>>>>>> https://www.openmicroscopy.org/site/support/omero5.2/sysadmins/unix/server-binary-repository.html#locking-and-remote-shares
>>>>>> regarding using remote shares.
>>>>>>
>>>>>> Cheers,
>>>>>> ~Josh.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Best regards,
>>>>>>> Andrii
>>>>>>>
>>>>>>>
>>>>>>> On 25/04/2016 16:21, Josh Moore wrote:
>>>>>>>>
>>>>>>>> On Fri, Apr 22, 2016 at 12:41 PM, Andrii Iudin <andrii at ebi.ac.uk>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Dear OMERO developers,
>>>>>>>>>
>>>>>>>>> We are experiencing an issue when importing a large number of
>>>>>>>>> images
>>>>>>>>> in
>>>>>>>>> a
>>>>>>>>> single consequent go. This usually happens after importing more
>>>>>>>>> than
>>>>>>>>> a
>>>>>>>>> thousand images. Please see below excerpts from the logs.
>>>>>>>>> Increasing
>>>>>>>>> a
>>>>>>>>> time
>>>>>>>>> period between each import seemed to helped a bit, however this
>>>>>>>>> issue
>>>>>>>>> ultimately happened anyway.
>>>>>>>>
>>>>>>>> Is this script available publicly? It would be useful to see how
>>>>>>>> it's
>>>>>>>> working.
>>>>>>>>
>>>>>>>>
>>>>>>>>> To get OMERO server working after this happens,
>>>>>>>>> it is necessary to stop it, remove .lock files and start the server
>>>>>>>>> again.
>>>>>>>>> It would be much appreciated if you could point out to a possible
>>>>>>>>> way
>>>>>>>>> to
>>>>>>>>> solve this issue.
>>>>>>>>
>>>>>>>> How did you stop OMERO? Is your file system on NFS or another remote
>>>>>>>> share?
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thank you and with best regards,
>>>>>>>>> Andrii
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> ~Josh.


More information about the ome-devel mailing list