[ome-users] FW: Core dump attempting to perfrom bulk upload to Omero

John Webber (NBI) John.Webber at nbi.ac.uk
Thu Jun 26 17:56:59 BST 2014


Hi Josh,

Thank you for your response today.

The testing that I have done up until now has just been on a single system.  When I first encountered the core dump, I tried different software versions and have found that, just changing the Omero version, the core dumps stop.

Following on from your email below I have, however, used another Omero 5 server, to test this process.  This separate Omero server was built by a colleague (Matthew Hartley) following a slightly different build.  I have found that my Omero Bulk Upload is core dumping (after about 4 uploads) on that server running Omero 5 as well.  This process does not core dump on any of our Omero 4 builds.

I can provide build instructions for the servers that I have built, and I can also provide my bulk upload scripts for you to use to try to replicate issue.  Would you like me to send all of this to you?

With regards to the second issue, the "NonZeroReturnCode" from the Onero CLI with the error "assert failed", I don't currently have the stdout or stderr output that accompany these, but I can send them to you tomorrow.  I have just checked the ulimit on the server where I am seeing this the most, and the ulimit is set to "unlimited".

Thanks again for your help!
John





-----Original Message-----
From: Josh Moore [mailto:josh at glencoesoftware.com] 
Sent: 25 June 2014 20:36
To: John Webber (NBI)
Cc: Roger Leigh; Roger Leigh; OME Users
Subject: Re: [ome-users] FW: Core dump attempting to perfrom bulk upload to Omero


On Jun 24, 2014, at 4:55 PM, John Webber (NBI) wrote:

> Hi Roger (et al)

Hi John,

> Sorry for the delay in coming back to you on this this issue - I have been trying some other testing to see if I can narrow down the issue.
> ...
> With Omero 5, using both ive 3.4 and ice3.5 I see the core dumps, as reported previously.  With the other versions of Omero I do not see the core dump.  

Thanks for the detailed analysis. Unfortunately, everyone on the team is still quite perplexed.


> I have kept ALL other components the same, but only see these core dumps with Omero 5.
> 
> I have attached to this email the following log files, which contain additional information which was produced when code dumped:

Is it just the one system or do you have other systems with a similar configuration that you could try? Do you have an installation script or a VM-like image that we might could try to reproduce with?


> As stated, exactly the same test data was used, on the same server, with all other versions remaining the same, but I did not see the core dumps.  The core dumps do not always occur on the same upload, but normally do occur after just a few uploads.
> 
> For the versions of Omero that do not core dump, I am seeing some a-hoc issues as well:  sometimes (again seemingly randomly and not always for the same upload), I see a "NonZeroReturnCode" from the Onero CLI with the error "assert failed".  This appears to happen much more frequently when testing with Ice version of 3.5.

Is there any stdout or stderr output that accompanies these? Could you be running into a ulimit constraint? Have we already touched on that? What is your ulimit currently set to?

Cheers,
~Josh.


> Can anyone give me any pointers on what is going wrong here?
> 
> Thanks
> John





> -----Original Message-----
> From: Roger Leigh [mailto:r.leigh at dundee.ac.uk]
> Sent: 13 June 2014 16:57
> To: John Webber (NBI); Roger Leigh; 
> ome-users at lists.openmicroscopy.org.uk
> Subject: Re: FW: [ome-users] Core dump attempting to perfrom bulk 
> upload to Omero
> 
> On 13/06/2014 15:25, John Webber (NBI) wrote:
>> Not sure I am getting very far with this error!
>> 
>> I have re-run the process a few times today for the purpose of testing.
>> 
>> I have a number of images that are being uploaded to Omero as part of this "bulk upload process".  This jobs is failing at DIFFERENT stages each time I run the job - sometimes it uploads 1 or 2 images before it fails, sometimes it will upload 5 or 6 before it fails.  It is therefore extremely had to track down the issue.
>> 
>> When I ran the job, I saw the following error:
>> 
>>      -! 06/13/14 12:33:54.390 warning: Proxy keep alive failed.
>> 
>> Which appears after the session information, i.e.
>> 
>>      Using session c4bef719-ae6f-4d85-b4bb-ffc09ee189b9 (webberj at localhost:4064). Idle timeout: 10.0 min. Current group: system
>>      -! 06/13/14 12:33:54.390 warning: Proxy keep alive failed.
>> 
>> At this point, a core dump is generated.
>> 
>> As per Roger's email below, I have run an strace, but the resulting logfile is too big to send! I've looked through the logfile but am not sure what I am looking for in order to just snip a section out of it!  Is there something specific in the log that I should look for, or can I transfer this file another way?
> 
> Looking through the log, the first SIGSEGV is in thread 29086.
> Immediately before, it's using fd=38 (client/logback-classic.jar) and a bit further back fd=5 (ice-glacier2.jar).  After this point, it repeatedly segfaults, presumably inside it's own SEGV handler.
> 
> It also spawns threads 29101 and 29105, both of which also subsequently segfault; not clear why but the jvm may be a horrible mess by this point.
> 
> This /might/ point to an issue in client-logback.jar, but that's not certain.
> 
>> The /var/log/messages file just has the following lines:
>> 
>> Jun 13 12:29:42 v0246 abrt[29178]: Saved core dump of pid 28122 (/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.55.x86_64/jre/bin/java) to /var/spool/abrt/ccpp-2014-06-13-12:29:35-28122 (750710784 bytes) Jun 13 12:29:42 v0246 abrtd: Directory 'ccpp-2014-06-13-12:29:35-28122' creation detected Jun 13 12:29:42 v0246 abrt[29178]: /var/spool/abrt is 1527378116 bytes (more than 1279MiB), deleting 'ccpp-2014-06-13-11:58:33-27089'
>> Jun 13 12:29:51 v0246 kernel: end_request: I/O error, dev fd0, sector
>> 0 Jun 13 12:29:51 v0246 kernel: end_request: I/O error, dev fd0, 
>> sector 0
> 
> The latter looks erroneous (just no floppy disc present; not sure what's trying to access it).  The former may be useful for getting a stacktrace; if it's different from the one you posted, might potentially be useful to get a backtrace of each thread.
> 
> 
> Regards,
> Roger
> 
> --
> Dr Roger Leigh -- Open Microscopy Environment Wellcome Trust Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dow Street,
> Dundee DD1 5EH Scotland UK   Tel: (01382) 386364
> 
> The University of Dundee is a registered Scottish Charity, No: 
> SC015096 
> <files.when.core.dump.tar.gz>_________________________________________
> ______
> 




More information about the ome-users mailing list