[ome-users] Import error
Douglas Russell
douglas_russell at hms.harvard.edu
Thu Jan 11 17:58:15 GMT 2018
If it helps, Jay was previously able to successfully import this dataset on
our quad OMERO+, but I don't know if that is the critical difference
between my scenario and the one that worked.
Happy to try any suggestions. Also, this data is on S3 if you want to play
with it, just let me know and I can grant your account access. I'd
recommend playing with it within AWS as it's pretty large!
Cheers,
Douglas
On Wed, 10 Jan 2018 at 18:00 Josh Moore <josh at glencoesoftware.com> wrote:
> On Tue, Jan 9, 2018 at 2:49 PM, Douglas Russell
> <douglas_russell at hms.harvard.edu> wrote:
> > FYI: Just the latter three postgres logs relate to the most recent
> attempt.
> >
> > On Tue, 9 Jan 2018 at 08:35 Douglas Russell
> > <douglas_russell at hms.harvard.edu> wrote:
> >>
> >> And this was all there was in the postgres logs:
> >>
> >> 01:33:37 LOG: unexpected EOF on client connection with an open
> transaction
> >> 02:13:52 LOG: checkpoints are occurring too frequently (21 seconds
> apart)
> >> 02:13:52 HINT: Consider increasing the configuration parameter
> >> "checkpoint_segments".
> >> 07:52:19 LOG: checkpoints are occurring too frequently (10 seconds
> apart)
> >> 07:52:19 HINT: Consider increasing the configuration parameter
> >> "checkpoint_segments".
> >> 08:50:05 LOG: unexpected EOF on client connection with an open
> transaction
> >>
> >> My gut feeling is that the database update fails, causing the whole
> import
> >> to fail, but it's hard to know what is going on.
>
> Sounds plausible. The exception that the server saw:
>
> An I/O error occurred while sending to the backend.
>
> rang some bells:
>
> * https://trac.openmicroscopy.org/ome/ticket/2977
> * https://trac.openmicroscopy.org/ome/ticket/5858
>
> both of which are _query_ issues where an argument (specifically an
> :id: array) had been passed in which was larger than an int. In this
> case, perhaps something similar is happening during the flush of the
> transaction, or more generally, something is just quite big. If it's
> the latter case, _perhaps_ there's a configuration option on the PG
> side to permit larger transactions. Obviously, that's only a
> workaround until the transactions can be broken up appropriately.
>
> I'll be in transit tomorrow but can help in the search for such a
> property afterwards.
>
> ~Josh
>
>
>
>
> >> D
> >>
> >> On Tue, 9 Jan 2018 at 08:20 Douglas Russell
> >> <douglas_russell at hms.harvard.edu> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Sorry for delay following this up.
> >>>
> >>> These OMERO instances are in Docker, yes, but otherwise I don't think
> >>> there is anything remarkable about the configuration. I have allocated
> >>> postgres 5GBs of RAM and am not seeing any messages about that running
> out
> >>> of memory. The OMERO server has 20GBs of RAM.
> >>>
> >>> The only errors in the Blitz log are:
> >>>
> >>> /opt/omero/server/OMERO.server/var/log/Blitz-0.log:2018-01-09
> >>> 00:15:32,910 ERROR [ ome.services.util.ServiceHandler]
> (l.Server-7)
> >>> Method interface
> ome.api.ThumbnailStore.createThumbnailsByLongestSideSet
> >>> invocation took 26125
> >>> /opt/omero/server/OMERO.server/var/log/Blitz-0.log:2018-01-09
> >>> 00:15:33,090 ERROR [o.s.t.interceptor.TransactionInterceptor]
> (2-thread-4)
> >>> Application exception overridden by rollback exception
> >>> /opt/omero/server/OMERO.server/var/log/Blitz-0.log:2018-01-09
> >>> 00:15:33,090 ERROR [ ome.services.util.ServiceHandler]
> (2-thread-4)
> >>> Method interface ome.services.util.Executor$Work.doWork invocation took
> >>> 17514887
> >>>
> >>> The only thing I haven't yet tried is moving postgres into the same
> >>> container as OMERO. I can try that if it would help, but I highly
> doubt it
> >>> will make any difference as in this setup, there is only one t2.2xlarge
> >>> instance running everything. It was using a load balancer (easiest way
> to
> >>> connect things up should they actually be on different hosts), but I
> tried
> >>> it without that where I just give the IP of the postgres docker
> container to
> >>> the OMERO instance configuration and I got the same result, so it's
> not the
> >>> timeout of the load balancer at fault.
> >>>
> >>> Thanks,
> >>>
> >>> Douglas
> >>>
> >>> On Wed, 3 Jan 2018 at 06:56 Mark Carroll <m.t.b.carroll at dundee.ac.uk>
> >>> wrote:
> >>>>
> >>>>
> >>>> On 12/23/2017 12:32 PM, Douglas Russell wrote:
> >>>> > I'd checked master logs files and there was nothing of interest in
> >>>> > there. dmesg is more promising though, good idea. It looks like a
> >>>> > memory
> >>>> > issue. I've increased the amount of memory available to 20GBs from
> >>>> > 4GBs
> >>>> > and now it does not fail in the same way. Not sure why so much RAM
> is
> >>>> > needed when each image in the screen is only 2.6MBs. Now there is a
> >>>> > nice
> >>>> > new error.
> >>>>
> >>>> You have me wondering if the server does the whole plate import in
> only
> >>>> one transaction. Also, if memory issues could be due to PostgreSQL or
> >>>> instead Java (e.g., Hibernate) and, assuming Java-side, if the issue
> is
> >>>> pixel data size (do the TIFF files use compression?) or metadata
> (e.g.,
> >>>> tons of ROIs?). Scalability has been an ongoing focus for us: we have
> >>>> done much but there is much more yet to be done.
> >>>>
> >>>> > Going by the error that I see when the database tries to rollback, I
> >>>> > think it is timeout related.
> >>>>
> >>>> I'm not seeing an obvious timeout issue here but I may well be missing
> >>>> something and maybe over the holiday period you have noticed more
> clues
> >>>> yourself too?
> >>>>
> >>>> > The import log: https://s3.amazonaws.com/dpwr/pat/import_log.txt
> >>>> > The server logs (I tried the import twice):
> >>>> > https://s3.amazonaws.com/dpwr/pat/omero_logs.zip
> >>>> >
> >>>> > There are a couple of these in the database logs as you'd expect for
> >>>> > the
> >>>> > two import attempts, but nothing else of interest.
> >>>> >
> >>>> > LOG: unexpected EOF on client connection with an open transaction
> >>>>
> >>>> Mmmm, late in the import process the EOFException from
> >>>> PGStream.ReceiveChar looks key. I'm trying to think what in
> PostgreSQL's
> >>>> pg_* tables might give some hint as to relevant activity or locks at
> the
> >>>> time (if it's a timeout, maybe a deadlock?). I guess there's nothing
> >>>> particularly exciting about how your OMERO server connects to
> >>>> PostgreSQL? It's simply across a LAN, perhaps via Docker or somesuch?
> >>>>
> >>>> How large is the plate? Given the 5.4 database changes I am wondering
> if
> >>>> this could possibly be a regression since 5.3.5 and how easy the error
> >>>> might be to reproduce in a test environment.
> >>>>
> >>>> Now the holiday season is behind us, at OME we're starting to return
> to
> >>>> the office. Happy New Year! With luck we'll get this issue figured out
> >>>> promptly. My apologies if I missed some existing context from the
> thread
> >>>> that I didn't realize already bears on some of my questions.
> >>>>
> >>>> -- Mark
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-users/attachments/20180111/9991f7a1/attachment.html>
More information about the ome-users
mailing list