[ome-users] Stall issues/download issues still, even with gevent...

Jake Carroll jake.carroll at uq.edu.au
Sun Jan 31 01:31:31 GMT 2016


Hi again

Unfortunately, still having issues on large downloads failing via the web interface.

I'm using a startup string such as this:

omero web start --workers 128 --wsgi-args '--worker-class gevent --error-logfile=/home/omero/OMERO.server/var/log/g_error.log'

And it doesn't seem to really matter what workers INT I use, we'll still see stalls and fails on download over the web interface.

I'm trying to download a 9.5GB ims format file.

The g_error.log looks interesting?

root at omero-prod-gen2:~# tail -f ~omero/OMERO.server/var/log/g_error.log
2016-01-31 09:23:53 [4781] [INFO] Booting worker with pid: 4781
2016-01-31 09:23:53 [4794] [INFO] Booting worker with pid: 4794
2016-01-31 09:23:53 [4798] [INFO] Booting worker with pid: 4798
2016-01-31 09:23:53 [4814] [INFO] Booting worker with pid: 4814
2016-01-31 09:23:53 [4808] [INFO] Booting worker with pid: 4808
2016-01-31 09:23:53 [4823] [INFO] Booting worker with pid: 4823
2016-01-31 09:23:53 [4827] [INFO] Booting worker with pid: 4827
2016-01-31 09:23:53 [4838] [INFO] Booting worker with pid: 4838
2016-01-31 09:23:53 [4858] [INFO] Booting worker with pid: 4858
2016-01-31 09:23:53 [4874] [INFO] Booting worker with pid: 4874
2016-01-31 09:26:00 [3852] [CRITICAL] WORKER TIMEOUT (pid:4608)
2016-01-31 09:26:00 [3852] [CRITICAL] WORKER TIMEOUT (pid:4608)
2016-01-31 09:26:01 [5314] [INFO] Booting worker with pid: 5314

I managed to download (randomly?) more than I ever have before, with 1.7GB of the file downloaded in this configuration - but it is still failing/stalling.

What could I be missing?

I even tried with 256 workers:

omero at omero-prod-gen2:~$ omero web start --workers 256 --wsgi-args '--worker-class gevent --error-logfile=/home/omero/OMERO.server/var/log/g_error.log'

...but the workers still seem to time out at *some* random point early on:

2016-01-31 09:29:24 [7360] [INFO] Booting worker with pid: 7360
2016-01-31 09:29:24 [7371] [INFO] Booting worker with pid: 7371
2016-01-31 09:30:14 [5433] [CRITICAL] WORKER TIMEOUT (pid:7045) <-- happened almost immediately after booting the workers.
2016-01-31 09:30:14 [5433] [CRITICAL] WORKER TIMEOUT (pid:7045)
2016-01-31 09:30:15 [8273] [INFO] Booting worker with pid: 8273

*SO THEN* I tried booting the worker processes with a very long time out:

omero web start --workers 256 --wsgi-args '-t 360 --worker-class gevent --error-logfile=/home/omero/OMERO.server/var/log/g_error.log'

And, after a much much much longer download length of 4.2GB of my 9.5GB ims file it finally started to show problem signs again:


2016-01-31 09:49:32 [8394] [CRITICAL] WORKER TIMEOUT (pid:10451)
2016-01-31 09:49:32 [8394] [CRITICAL] WORKER TIMEOUT (pid:10451)
2016-01-31 09:49:33 [11503] [INFO] Booting worker with pid: 11503

And then it failed again, unfortunately.

So made the timeout an enormous number:

omero web start --workers 256 --wsgi-args '-t 1440 --worker-class gevent --error-logfile=/home/omero/OMERO.server/var/log/g_error.log'

...and I can finally drag in my 9.5GB file over the OMERO web interface, without timeout failures.

Something doesn't feel quite right, does it?

-jc









More information about the ome-users mailing list