[ome-users] Stall issues/download issues still, even with gevent...

Simon Li spli at dundee.ac.uk
Sun Jan 31 11:28:22 GMT 2016


Hi Jake

We're still investigating the problem- our silence since your last email is because we're still unsure of the best way to fix it. We can reproduce it on our systems, and we've tested multiple alternative configurations, none of which solved the problem. We'll continue investigating this week, as soon as we have a solution we'll let you know.

Simon

On 31 Jan 2016 01:31, "Jake Carroll" <jake.carroll at uq.edu.au<mailto:jake.carroll at uq.edu.au>> wrote:
Hi again

Unfortunately, still having issues on large downloads failing via the web interface.

I'm using a startup string such as this:

omero web start --workers 128 --wsgi-args '--worker-class gevent --error-logfile=/home/omero/OMERO.server/var/log/g_error.log'

And it doesn't seem to really matter what workers INT I use, we'll still see stalls and fails on download over the web interface.

I'm trying to download a 9.5GB ims format file.

The g_error.log looks interesting?

root at omero-prod-gen2:~# tail -f ~omero/OMERO.server/var/log/g_error.log
2016-01-31 09:23:53 [4781] [INFO] Booting worker with pid: 4781
2016-01-31 09:23:53 [4794] [INFO] Booting worker with pid: 4794
2016-01-31 09:23:53 [4798] [INFO] Booting worker with pid: 4798
2016-01-31 09:23:53 [4814] [INFO] Booting worker with pid: 4814
2016-01-31 09:23:53 [4808] [INFO] Booting worker with pid: 4808
2016-01-31 09:23:53 [4823] [INFO] Booting worker with pid: 4823
2016-01-31 09:23:53 [4827] [INFO] Booting worker with pid: 4827
2016-01-31 09:23:53 [4838] [INFO] Booting worker with pid: 4838
2016-01-31 09:23:53 [4858] [INFO] Booting worker with pid: 4858
2016-01-31 09:23:53 [4874] [INFO] Booting worker with pid: 4874
2016-01-31 09:26:00 [3852] [CRITICAL] WORKER TIMEOUT (pid:4608)
2016-01-31 09:26:00 [3852] [CRITICAL] WORKER TIMEOUT (pid:4608)
2016-01-31 09:26:01 [5314] [INFO] Booting worker with pid: 5314

I managed to download (randomly?) more than I ever have before, with 1.7GB of the file downloaded in this configuration - but it is still failing/stalling.

What could I be missing?

I even tried with 256 workers:

omero at omero-prod-gen2:~$ omero web start --workers 256 --wsgi-args '--worker-class gevent --error-logfile=/home/omero/OMERO.server/var/log/g_error.log'

...but the workers still seem to time out at *some* random point early on:

2016-01-31 09:29:24 [7360] [INFO] Booting worker with pid: 7360
2016-01-31 09:29:24 [7371] [INFO] Booting worker with pid: 7371
2016-01-31 09:30:14 [5433] [CRITICAL] WORKER TIMEOUT (pid:7045) <-- happened almost immediately after booting the workers.
2016-01-31 09:30:14 [5433] [CRITICAL] WORKER TIMEOUT (pid:7045)
2016-01-31 09:30:15 [8273] [INFO] Booting worker with pid: 8273

*SO THEN* I tried booting the worker processes with a very long time out:

omero web start --workers 256 --wsgi-args '-t 360 --worker-class gevent --error-logfile=/home/omero/OMERO.server/var/log/g_error.log'

And, after a much much much longer download length of 4.2GB of my 9.5GB ims file it finally started to show problem signs again:


2016-01-31 09:49:32 [8394] [CRITICAL] WORKER TIMEOUT (pid:10451)
2016-01-31 09:49:32 [8394] [CRITICAL] WORKER TIMEOUT (pid:10451)
2016-01-31 09:49:33 [11503] [INFO] Booting worker with pid: 11503

And then it failed again, unfortunately.

So made the timeout an enormous number:

omero web start --workers 256 --wsgi-args '-t 1440 --worker-class gevent --error-logfile=/home/omero/OMERO.server/var/log/g_error.log'

...and I can finally drag in my 9.5GB file over the OMERO web interface, without timeout failures.

Something doesn't feel quite right, does it?

-jc







_______________________________________________
ome-users mailing list
ome-users at lists.openmicroscopy.org.uk<mailto:ome-users at lists.openmicroscopy.org.uk>
http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-users

The University of Dundee is a registered Scottish Charity, No: SC015096

The University of Dundee is a registered Scottish Charity, No: SC015096
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-users/attachments/20160131/aa63d774/attachment.html>


More information about the ome-users mailing list