<div dir="ltr"><div>If it helps, Jay was previously able to successfully import this dataset on our quad OMERO+, but I don't know if that is the critical difference between my scenario and the one that worked.<br></div><div><br></div><div>Happy to try any suggestions. Also, this data is on S3 if you want to play with it, just let me know and I can grant your account access. I'd recommend playing with it within AWS as it's pretty large!<br class="inbox-inbox-Apple-interchange-newline"></div><div><br></div><div>Cheers,</div><div><br></div><div>Douglas</div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, 10 Jan 2018 at 18:00 Josh Moore <<a href="mailto:josh@glencoesoftware.com">josh@glencoesoftware.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue, Jan 9, 2018 at 2:49 PM, Douglas Russell<br>

<<a href="mailto:douglas_russell@hms.harvard.edu" target="_blank">douglas_russell@hms.harvard.edu</a>> wrote:<br>

> FYI: Just the latter three postgres logs relate to the most recent attempt.<br>

><br>

> On Tue, 9 Jan 2018 at 08:35 Douglas Russell<br>

> <<a href="mailto:douglas_russell@hms.harvard.edu" target="_blank">douglas_russell@hms.harvard.edu</a>> wrote:<br>

>><br>

>> And this was all there was in the postgres logs:<br>

>><br>

>> 01:33:37 LOG: unexpected EOF on client connection with an open transaction<br>

>> 02:13:52 LOG: checkpoints are occurring too frequently (21 seconds apart)<br>

>> 02:13:52 HINT: Consider increasing the configuration parameter<br>

>> "checkpoint_segments".<br>

>> 07:52:19 LOG: checkpoints are occurring too frequently (10 seconds apart)<br>

>> 07:52:19 HINT: Consider increasing the configuration parameter<br>

>> "checkpoint_segments".<br>

>> 08:50:05 LOG: unexpected EOF on client connection with an open transaction<br>

>><br>

>> My gut feeling is that the database update fails, causing the whole import<br>

>> to fail, but it's hard to know what is going on.<br>

<br>

Sounds plausible. The exception that the server saw:<br>

<br>

  An I/O error occurred while sending to the backend.<br>

<br>

rang some bells:<br>

<br>

 * <a href="https://trac.openmicroscopy.org/ome/ticket/2977" rel="noreferrer" target="_blank">https://trac.openmicroscopy.org/ome/ticket/2977</a><br>

 * <a href="https://trac.openmicroscopy.org/ome/ticket/5858" rel="noreferrer" target="_blank">https://trac.openmicroscopy.org/ome/ticket/5858</a><br>

<br>

both of which are _query_ issues where an argument (specifically an<br>

:id: array) had been passed in which was larger than an int. In this<br>

case, perhaps something similar is happening during the flush of the<br>

transaction, or more generally, something is just quite big. If it's<br>

the latter case, _perhaps_ there's a configuration option on the PG<br>

side to permit larger transactions. Obviously, that's only a<br>

workaround until the transactions can be broken up appropriately.<br>

<br>

I'll be in transit tomorrow but can help in the search for such a<br>

property afterwards.<br>

<br>

~Josh<br>

<br>

<br>

<br>

<br>

>> D<br>

>><br>

>> On Tue, 9 Jan 2018 at 08:20 Douglas Russell<br>

>> <<a href="mailto:douglas_russell@hms.harvard.edu" target="_blank">douglas_russell@hms.harvard.edu</a>> wrote:<br>

>>><br>

>>> Hi,<br>

>>><br>

>>> Sorry for delay following this up.<br>

>>><br>

>>> These OMERO instances are in Docker, yes, but otherwise I don't think<br>

>>> there is anything remarkable about the configuration. I have allocated<br>

>>> postgres 5GBs of RAM and am not seeing any messages about that running out<br>

>>> of memory. The OMERO server has 20GBs of RAM.<br>

>>><br>

>>> The only errors in the Blitz log are:<br>

>>><br>

>>> /opt/omero/server/OMERO.server/var/log/Blitz-0.log:2018-01-09<br>

>>> 00:15:32,910 ERROR [        ome.services.util.ServiceHandler] (l.Server-7)<br>

>>> Method interface ome.api.ThumbnailStore.createThumbnailsByLongestSideSet<br>

>>> invocation took 26125<br>

>>> /opt/omero/server/OMERO.server/var/log/Blitz-0.log:2018-01-09<br>

>>> 00:15:33,090 ERROR [o.s.t.interceptor.TransactionInterceptor] (2-thread-4)<br>

>>> Application exception overridden by rollback exception<br>

>>> /opt/omero/server/OMERO.server/var/log/Blitz-0.log:2018-01-09<br>

>>> 00:15:33,090 ERROR [        ome.services.util.ServiceHandler] (2-thread-4)<br>

>>> Method interface ome.services.util.Executor$Work.doWork invocation took<br>

>>> 17514887<br>

>>><br>

>>> The only thing I haven't yet tried is moving postgres into the same<br>

>>> container as OMERO. I can try that if it would help, but I highly doubt it<br>

>>> will make any difference as in this setup, there is only one t2.2xlarge<br>

>>> instance running everything. It was using a load balancer (easiest way to<br>

>>> connect things up should they actually be on different hosts), but I tried<br>

>>> it without that where I just give the IP of the postgres docker container to<br>

>>> the OMERO instance configuration and I got the same result, so it's not the<br>

>>> timeout of the load balancer at fault.<br>

>>><br>

>>> Thanks,<br>

>>><br>

>>> Douglas<br>

>>><br>

>>> On Wed, 3 Jan 2018 at 06:56 Mark Carroll <<a href="mailto:m.t.b.carroll@dundee.ac.uk" target="_blank">m.t.b.carroll@dundee.ac.uk</a>><br>

>>> wrote:<br>

>>>><br>

>>>><br>

>>>> On 12/23/2017 12:32 PM, Douglas Russell wrote:<br>

>>>> > I'd checked master logs files and there was nothing of interest in<br>

>>>> > there. dmesg is more promising though, good idea. It looks like a<br>

>>>> > memory<br>

>>>> > issue. I've increased the amount of memory available to 20GBs from<br>

>>>> > 4GBs<br>

>>>> > and now it does not fail in the same way. Not sure why so much RAM is<br>

>>>> > needed when each image in the screen is only 2.6MBs. Now there is a<br>

>>>> > nice<br>

>>>> > new error.<br>

>>>><br>

>>>> You have me wondering if the server does the whole plate import in only<br>

>>>> one transaction. Also, if memory issues could be due to PostgreSQL or<br>

>>>> instead Java (e.g., Hibernate) and, assuming Java-side, if the issue is<br>

>>>> pixel data size (do the TIFF files use compression?) or metadata (e.g.,<br>

>>>> tons of ROIs?). Scalability has been an ongoing focus for us: we have<br>

>>>> done much but there is much more yet to be done.<br>

>>>><br>

>>>> > Going by the error that I see when the database tries to rollback, I<br>

>>>> > think it is timeout related.<br>

>>>><br>

>>>> I'm not seeing an obvious timeout issue here but I may well be missing<br>

>>>> something and maybe over the holiday period you have noticed more clues<br>

>>>> yourself too?<br>

>>>><br>

>>>> > The import log: <a href="https://s3.amazonaws.com/dpwr/pat/import_log.txt" rel="noreferrer" target="_blank">https://s3.amazonaws.com/dpwr/pat/import_log.txt</a><br>

>>>> > The server logs (I tried the import twice):<br>

>>>> > <a href="https://s3.amazonaws.com/dpwr/pat/omero_logs.zip" rel="noreferrer" target="_blank">https://s3.amazonaws.com/dpwr/pat/omero_logs.zip</a><br>

>>>> ><br>

>>>> > There are a couple of these in the database logs as you'd expect for<br>

>>>> > the<br>

>>>> > two import attempts, but nothing else of interest.<br>

>>>> ><br>

>>>> > LOG: unexpected EOF on client connection with an open transaction<br>

>>>><br>

>>>> Mmmm, late in the import process the EOFException from<br>

>>>> PGStream.ReceiveChar looks key. I'm trying to think what in PostgreSQL's<br>

>>>> pg_* tables might give some hint as to relevant activity or locks at the<br>

>>>> time (if it's a timeout, maybe a deadlock?). I guess there's nothing<br>

>>>> particularly exciting about how your OMERO server connects to<br>

>>>> PostgreSQL? It's simply across a LAN, perhaps via Docker or somesuch?<br>

>>>><br>

>>>> How large is the plate? Given the 5.4 database changes I am wondering if<br>

>>>> this could possibly be a regression since 5.3.5 and how easy the error<br>

>>>> might be to reproduce in a test environment.<br>

>>>><br>

>>>> Now the holiday season is behind us, at OME we're starting to return to<br>

>>>> the office. Happy New Year! With luck we'll get this issue figured out<br>

>>>> promptly. My apologies if I missed some existing context from the thread<br>

>>>> that I didn't realize already bears on some of my questions.<br>

>>>><br>

>>>> -- Mark<br>

</blockquote></div>