[ome-users] FW: Core dump attempting to perfrom bulk upload to Omero

Fri Jun 13 15:25:21 BST 2014

Not sure I am getting very far with this error!

I have re-run the process a few times today for the purpose of testing.

I have a number of images that are being uploaded to Omero as part of this "bulk upload process".  This jobs is failing at DIFFERENT stages each time I run the job - sometimes it uploads 1 or 2 images before it fails, sometimes it will upload 5 or 6 before it fails.  It is therefore extremely had to track down the issue.

When I ran the job, I saw the following error:

	-! 06/13/14 12:33:54.390 warning: Proxy keep alive failed.

Which appears after the session information, i.e.

	Using session c4bef719-ae6f-4d85-b4bb-ffc09ee189b9 (webberj at localhost:4064). Idle timeout: 10.0 min. Current group: system
	-! 06/13/14 12:33:54.390 warning: Proxy keep alive failed.

At this point, a core dump is generated.  

As per Roger's email below, I have run an strace, but the resulting logfile is too big to send! I've looked through the logfile but am not sure what I am looking for in order to just snip a section out of it!  Is there something specific in the log that I should look for, or can I transfer this file another way?

The /var/log/messages file just has the following lines:

Jun 13 12:29:42 v0246 abrt[29178]: Saved core dump of pid 28122 (/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.55.x86_64/jre/bin/java) to /var/spool/abrt/ccpp-2014-06-13-12:29:35-28122 (750710784 bytes) Jun 13 12:29:42 v0246 abrtd: Directory 'ccpp-2014-06-13-12:29:35-28122' creation detected Jun 13 12:29:42 v0246 abrt[29178]: /var/spool/abrt is 1527378116 bytes (more than 1279MiB), deleting 'ccpp-2014-06-13-11:58:33-27089'
Jun 13 12:29:51 v0246 kernel: end_request: I/O error, dev fd0, sector 0 Jun 13 12:29:51 v0246 kernel: end_request: I/O error, dev fd0, sector 0

Any pointers on how else I can trace this?

Thanks
John	

-----Original Message-----
From: ome-users-bounces at lists.openmicroscopy.org.uk [mailto:ome-users-bounces at lists.openmicroscopy.org.uk] On Behalf Of Roger Leigh
Sent: 13 June 2014 11:04
To: ome-users at lists.openmicroscopy.org.uk
Subject: Re: [ome-users] Core dump attempting to perfrom bulk upload to Omero

On 12/06/14 11:05, John Webber (NBI) wrote:
> Hi Simon,
>
> Sorry for not responding to your message till today - busy week!
>
> As per your email below, I have entered "bt" for a stack trace, but AM surprised, not at the unexpectedly long output, but the unexpectedly short output - see below:
>
> (gdb) bt
> #0  0x0000003d7a432925 in raise (sig=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x0000003d7a434105 in abort () at abort.c:92
> #2  0x00007f5dd33088c5 in os::abort (dump_core=true)
>      at
> /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/os/linux/vm/os_l
> inux.cpp:1594
> #3  0x00007f5dd347678f in VMError::report_and_die (this=0x7f5d874f2860)
>      at
> /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/share/vm/utiliti
> es/vmError.cpp:1078
> #4  0x00007f5dd330da92 in JVM_handle_linux_signal (sig=11, info=0x7f5d874f2a30, ucVoid=0x7f5d874f2900,
>      abort_if_unrecognized=-2024855584)
>      at
> /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/os_cpu/linux_x86
> /vm/os_linux_x86.cpp:531
> #5  <signal handler called>
> #6  inflateEnd (strm=0x7f5d9c001cc0) at inflate.c:1162
> #7  0x00007f5dd1ff512e in Java_java_util_zip_Inflater_end (env=0x7f5d880141d8, cls=<value optimized out>,
>      addr=140040025939136) at
> ../../../src/share/native/java/util/zip/Inflater.c:188

Dear John,

This is only speculative, but it appears that the crash here is while unpacking a zip file, which I would presume to be a .jar file or similar while setting up the JVM or using a classloader.  This is deep in the JVM internals, so unlikely to be an issue with any of the java code itself.  This might be a JVM bug triggered by a corrupt jar.
Identifying which might be tricky, but should be possible if you were to repeat this using "strace -f -o logfile -e trace=file,process", for example.

If this is the case, it might be worth re-downloading the zipfile for omero to double-check if the jars are the same.  Of course, it's also possible that one might be corrupt in our zipfile, in which case we'll need to fix that once we know which is at fault.

If this can be reduced to triggering with just one file (or a small number of files), if you could zip them up and upload them using qa.openmicroscopy.org, we can try to reproduce this as well.  It may be that this is caused by using a specific file type.

Kind regards,
Roger

--
Dr Roger Leigh -- Open Microscopy Environment Wellcome Trust Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dow Street,
Dundee DD1 5EH Scotland UK   Tel: (01382) 386364

The University of Dundee is a registered Scottish Charity, No: SC015096 _______________________________________________
ome-users mailing list
ome-users at lists.openmicroscopy.org.uk
http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-users