[ome-users] FW: Core dump attempting to perfrom bulk upload to Omero

Josh Moore josh at glencoesoftware.com
Wed Jun 25 20:36:11 BST 2014


On Jun 24, 2014, at 4:55 PM, John Webber (NBI) wrote:

> Hi Roger (et al)

Hi John,

> Sorry for the delay in coming back to you on this this issue - I have been trying some other testing to see if I can narrow down the issue.
> ...
> With Omero 5, using both ive 3.4 and ice3.5 I see the core dumps, as reported previously.  With the other versions of Omero I do not see the core dump.  

Thanks for the detailed analysis. Unfortunately, everyone on the team is still quite perplexed.


> I have kept ALL other components the same, but only see these core dumps with Omero 5.
> 
> I have attached to this email the following log files, which contain additional information which was produced when code dumped:

Is it just the one system or do you have other systems with a similar configuration that you could try? Do you have an installation script or a VM-like image that we might could try to reproduce with?


> As stated, exactly the same test data was used, on the same server, with all other versions remaining the same, but I did not see the core dumps.  The core dumps do not always occur on the same upload, but normally do occur after just a few uploads.
> 
> For the versions of Omero that do not core dump, I am seeing some a-hoc issues as well:  sometimes (again seemingly randomly and not always for the same upload), I see a "NonZeroReturnCode" from the Onero CLI with the error "assert failed".  This appears to happen much more frequently when testing with Ice version of 3.5.

Is there any stdout or stderr output that accompanies these? Could you be running into a ulimit constraint? Have we already touched on that? What is your ulimit currently set to?

Cheers,
~Josh.


> Can anyone give me any pointers on what is going wrong here?
> 
> Thanks
> John





> -----Original Message-----
> From: Roger Leigh [mailto:r.leigh at dundee.ac.uk] 
> Sent: 13 June 2014 16:57
> To: John Webber (NBI); Roger Leigh; ome-users at lists.openmicroscopy.org.uk
> Subject: Re: FW: [ome-users] Core dump attempting to perfrom bulk upload to Omero
> 
> On 13/06/2014 15:25, John Webber (NBI) wrote:
>> Not sure I am getting very far with this error!
>> 
>> I have re-run the process a few times today for the purpose of testing.
>> 
>> I have a number of images that are being uploaded to Omero as part of this "bulk upload process".  This jobs is failing at DIFFERENT stages each time I run the job - sometimes it uploads 1 or 2 images before it fails, sometimes it will upload 5 or 6 before it fails.  It is therefore extremely had to track down the issue.
>> 
>> When I ran the job, I saw the following error:
>> 
>>      -! 06/13/14 12:33:54.390 warning: Proxy keep alive failed.
>> 
>> Which appears after the session information, i.e.
>> 
>>      Using session c4bef719-ae6f-4d85-b4bb-ffc09ee189b9 (webberj at localhost:4064). Idle timeout: 10.0 min. Current group: system
>>      -! 06/13/14 12:33:54.390 warning: Proxy keep alive failed.
>> 
>> At this point, a core dump is generated.
>> 
>> As per Roger's email below, I have run an strace, but the resulting logfile is too big to send! I've looked through the logfile but am not sure what I am looking for in order to just snip a section out of it!  Is there something specific in the log that I should look for, or can I transfer this file another way?
> 
> Looking through the log, the first SIGSEGV is in thread 29086.
> Immediately before, it's using fd=38 (client/logback-classic.jar) and a bit further back fd=5 (ice-glacier2.jar).  After this point, it repeatedly segfaults, presumably inside it's own SEGV handler.
> 
> It also spawns threads 29101 and 29105, both of which also subsequently segfault; not clear why but the jvm may be a horrible mess by this point.
> 
> This /might/ point to an issue in client-logback.jar, but that's not certain.
> 
>> The /var/log/messages file just has the following lines:
>> 
>> Jun 13 12:29:42 v0246 abrt[29178]: Saved core dump of pid 28122 (/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.55.x86_64/jre/bin/java) to /var/spool/abrt/ccpp-2014-06-13-12:29:35-28122 (750710784 bytes) Jun 13 12:29:42 v0246 abrtd: Directory 'ccpp-2014-06-13-12:29:35-28122' creation detected Jun 13 12:29:42 v0246 abrt[29178]: /var/spool/abrt is 1527378116 bytes (more than 1279MiB), deleting 'ccpp-2014-06-13-11:58:33-27089'
>> Jun 13 12:29:51 v0246 kernel: end_request: I/O error, dev fd0, sector 
>> 0 Jun 13 12:29:51 v0246 kernel: end_request: I/O error, dev fd0, 
>> sector 0
> 
> The latter looks erroneous (just no floppy disc present; not sure what's trying to access it).  The former may be useful for getting a stacktrace; if it's different from the one you posted, might potentially be useful to get a backtrace of each thread.
> 
> 
> Regards,
> Roger
> 
> --
> Dr Roger Leigh -- Open Microscopy Environment Wellcome Trust Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dow Street,
> Dundee DD1 5EH Scotland UK   Tel: (01382) 386364
> 
> The University of Dundee is a registered Scottish Charity, No: SC015096
> <files.when.core.dump.tar.gz>_______________________________________________
> 




More information about the ome-users mailing list