[ome-devel] Omero server performance issue

Tue Aug 11 21:03:20 BST 2009

Hi Luca,

I just wanted to add a clarification to this: though Chris is right
that using a single unloaded enumeration is probably the best practice
here, what you were doing won't create multiple copies of the same
enumeration, and can't since the "value" column in the db is
UNIQUE. Instead, each non-unloaded enumeration is checked against the
current values, which means an extra SELECT per enum instance rather
than an INSERT.

Cheers,
~Josh.

Chris Allan writes:
 > Hi Luca,
 > 
 > One big thing is to be very careful with enumerations:
 > 
 > 26       do = om.model.DimensionOrderI()
 > 27       do.setValue(omt.rstring('XYZCT'))
 > 28       pt = om.model.PixelsTypeI()
 > 29       pt.setValue(omt.rstring('uint16'))
 > 
 > This is creating brand new enumerations for every image inserted, which
 > will slow any enumeration queries to a crawl and is causing 3000 extra
 > INSERTs, significant graph inspection overhead, etc. You want to
 > retrieve the existing enumerations through IQuery and then use an
 > unloaded version of the object to help you out, for example
 > (pseudo-code):
 > 
 > ...
 > dimension_orders = iquery.findAll("DimensionOrder")
 > xyzct = filter(lambda a: a.value.val == 'XYZCT')[0]
 > syzct.unload()
 > ...
 > for image in range(0, 100):
 > ...
 >     p.setDimensionOrder(xyzct)
 > 
 > The above applies to FormatI, DimensionOrderI and PixelsTypeI. In fact,
 > you've sort of corrupted your database in a way for the particular user
 > you've logged in as by adding 1000's of bogus enumerations.
 > 
 > Give that a try first after deleting your bogus enums and see where you
 > get to.
 > 
 > -Chris
 > 
 > On Wed, 2009-08-05 at 16:06 +0200, Luca Lianas wrote:
 > > Sorry, i forgot the attachment...
 > > 
 > > 
 > > 2009/8/5 Luca Lianas <luca.lianas at crs4.it>
 > >         I belong to the biomedical reserch group at CRS4, a research
 > >         centre in Italy; we are currently using omero in several
 > >         projects and we are running some performance tests during
 > >         these days.
 > >         We noticed that the server has low performances when loading a
 > >         large amount of data (I tried to load the meta-informations
 > >         for 50.000 4-channel images).
 > >         I did a smaller test loading 1000 images using a python script
 > >         and it took 1 minute and 42 seconds to load the data (as said
 > >         before, I only wrote the meta-data of the images into the
 > >         database, the real pixels are stored into a HDFS file system).
 > >         I used the compiled version of Omero downloaded from the
 > >         website and with default configuration. Omero runs on a Linux
 > >         server (Fedora core 11) with a dual opteron processor (248
 > >         model) and 4GB of RAM.
 > >         I'm wondering what is the problem and if there are some hints
 > >         to improve the performances on the server. Any help is
 > >         appreciated.
 > >         
 > >         Please see the script I'm using, as per attachment (maybe is
 > >         the script itself my problem).
 > >         
 > >         Thanks for you attention
 > >         
 > >         Luca