[ome-users] OMERO crash: Too many open files
Alexander Tournier
alexander.tournier at cancer.org.uk
Fri Jan 27 14:25:39 GMT 2012
Thanks Josh, all that makes sense.
I suspect I didn't actually close the matlab session for a while so the
omero session stayed open and didn't release the handles ... so it's
probably ok on that front.
thanks
On 27/01/12 13:55, Josh Moore wrote:
> On Jan 27, 2012, at 2:42 PM, Alexander Tournier wrote:
>
>> Thanks Mark and Josh,
>>
>> I've looked at my scripts and I heavily used the RawPixelStore service, which would seem to open a file handle on the server.
>> - by my calculations I did something like 20000 calls to RawPixelStore, with as many as 2000 opened during a single matlab session.
>> - my calls to RawPixelStore are within matlab routines, this would imply that the garbage collection doesn't apply to the handles created within a matlab routine and for which the handle is lost. Is that right? this would explain a java memory leak I noticed.
> This is correct. The method "RawPixelStorePrx.close" is a remote method, and should definitely be somewhere in the equivalent of a try/finally block. There is no finalizer or similar on the proxy to ensure that this gets called.
>
>> - RawPixelStore seems to be the culprit here, is there any other service I should be particularly aware of? (for peace of mind I've gone through my routines and closed all services at the end).
> Every stateful service (RawPixelStore, RawFileStore, Search, ThumbnailStore) has a close method which should be called. There are several ways to detect them. They are all subinterfaces of StatefulServiceInterfacePrx. They have a "close" method. They are the return value of a method named "create"-something on the ServiceFactoryPrx. And, they will be returned to you by the client.getStatefulServices() method, which returns all instances in current session which have not yet been closed.
>
>> - I wasn't particularly 'keeping them alive' as I had to restart matlab often, would this indicate that these open services remained despite closing matlab? Is that a case where the session needs to be specifically closed?
> How are you initializing your service? Are you calling detachOnDestroy? (probably not) If not, then the sessions should time-out after 10 minutes, in which case the handles are inappropriately surviving. Will Moore mentioned he has seen this behavior before, though my assumption is that it's been fixed for 4.4.
>
> Cheers,
> ~Josh
>
>> Best,
>> Alex
>>
>>
>> On 27/01/12 12:21, Mark Henshall wrote:
>>> doing 'cat /proc/sys/fs/file-max' gives 596753
>>>
>>> doing 'ulimit -Hn' as omero gives 1024
>>>
>>>
>>>
>>>
>>>
>>> Quoting Josh Moore<josh at glencoesoftware.com>:
>>>
>>>> On Jan 26, 2012, at 6:41 PM, Alexander Tournier wrote:
>>>>
>>>>> Hi there,
>>>>>
>>>>> I've been using OMERO quite heavily and ... it crashed. ie I can't
>>>>> login to OMERO anymore although others seem to be able to.
>>>>> The most interesting is the error message below.
>>>>> I was saving quite a lot of data and I realise that I didn't close
>>>>> the rawPixelStore which might explain the error.
>>>>> The problem I have at the moment is that it doesn't seem to be
>>>>> rectifying itself on its own.
>>>>> Is there a time delay by which the open files are forcefully closed
>>>>> and services resumed or do we need to reboot OMERO?
>>>> The services won't be closed until the session is closed. If you are
>>>> somehow keeping that alive (e.g. by doing other activities) _and_ not
>>>> closing your services, then you'll run out of file handles. Do you
>>>> know how many are configured for your server?
>>>>
>>>> ~Josh
>>>>
>>>>> Thanks,
>>>>> Alexander
>>>>>
>>>>>
>>>>> Error using omero.client/createSession
>>>>> Java exception occurred:
>>>>> Glacier2.PermissionDeniedException
>>>>> reason = "Internal error. Please contact your administrator:
>>>>> Wrapped Exception:
>>>>> (org.springframework.ldap.CommunicationException):
>>>>> uk-lri-ldco02.crwin.crnet.org:636; nested exception is
>>>>> javax.naming.CommunicationException:
>>>>> uk-lri-ldco02.crwin.crnet.org:636 [Root exception is
>>>>> java.net.SocketException: Too many open files]"
>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>>>>> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
>>>>> Source)
>>>>> at java.lang.reflect.Constructor.newInstance(Unknown Source)
>>>>> at java.lang.Class.newInstance0(Unknown Source)
>>>>> at java.lang.Class.newInstance(Unknown Source)
>>>>> at
>>>>>
>>>>> IceInternal.BasicStream$DynamicUserExceptionFactory.createAndThrow(BasicStream.java:2243)
>>>>> at IceInternal.BasicStream.throwException(BasicStream.java:1632)
>>>>> at IceInternal.Outgoing.throwUserException(Outgoing.java:442)
>>>>> at Glacier2._RouterDelM.createSession(_RouterDelM.java:42)
>>>>> at Glacier2.RouterPrxHelper.createSession(RouterPrxHelper.java:51)
>>>>> at Glacier2.RouterPrxHelper.createSession(RouterPrxHelper.java:29)
>>>>> at omero.client.createSession(client.java:628)
>>>>> at omero.client.createSession(client.java:567)
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for the above-named person(s). If you are not the intended recipient, notify the sender immediately, delete this email from your system and do not disclose or use for any purpose.
We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you.
Cancer Research UK
Registered in England and Wales
Company Registered Number: 4325234.
Registered Charity Number: 1089464 and Scotland SC041666
Registered Office Address: Angel Building, 407 St John Street, London EC1V 4AD.
More information about the ome-users
mailing list