[ome-users] OMERO and storing large amounts of data

Sebastien Besson (Staff) s.besson at dundee.ac.uk
Mon Apr 22 15:10:24 BST 2019


Hi Pasi

On 17 Apr 2019, at 14:02, Pasi Kankaanpää <pkankaan at abo.fi<mailto:pkankaan at abo.fi>> wrote:

Hi everyone,

We at Turku BioImaging and the Finnish Euro-BioImaging Node are currently making plans for an OMERO server setup, and some questions have come up in this context. The OMERO team recommended posting these questions here, in case they would benefit also others. Any experiences, thought and comments would be greatly appreciated.

Thanks for raising these questions publicly. Most of these answers are informed by our own use case with IDR. Other community members with large-scale deployments might want to share their experience as well.

1) How well would you estimate OMERO handles large files, such as data from a light sheet microscope, in these two scenarios a) a large amount of data consisting of numerous small files and b) a large amount of data consisting of just one or few very large files.

IDR has multiple representative examples of both situations:

1- the case of numerous small files is frequent in the High-Content Screening domain where individual plates can typically span over 10-100K files [1]
2- the case of 100GB-1TB sized individual files is becoming increasingly common e.g. from the light-sheet technology experiments [2]

Re “How well…?”, we won’t pretend that routinely loading multi-TB datasets with many 1000s of files or very large TB-scale files does not require some care and consideration. For example, copying the data often is no longer an option and we use “in-place” import for these dataset or we will delay thumbnail calculation until the server is less busy-- these fall under what we refer to as “advanced import mechanisms” [3]. Hopefully, the two previous links demonstrate that both scenarios can be handled with appropriate use of OMERO’s capabilities.

[1] https://idr.openmicroscopy.org/webclient/?show=image-1898184
[2] https://idr.openmicroscopy.org/webclient/?show=image-4007801
[3] https://docs.openmicroscopy.org/omero/5.4.10/sysadmins/import-scenarios.html

2) If we think of a 5-10 year spectrum and a total cumulative data amount during this time of 10 petabytes, will OMERO handle this without problems?

Large deployments like IDR are routinely serving 100TB-1PB datasets these days. Considering the data growth, I would say the data volume will naturally fall into your range over the next decade. Probably even more than disk size, the choice of hardware and the access speed to the storage/database are important architectural decisions.

3) If we think of a scenario where of the total storage capacity 10-20% would be faster access working storage and the rest more permanent archiving storage (that can have slower access times), how would you envision OMERO working with this scenario, or what would you see as pros and cons of these alternatives:

a) OMERO would run only in working storage and not archiving (how would the transfer between them take place?)

b) OMERO would run only in archiving and not working storage (how would the transfer between them take place?)

c) OMERO would run both in working storage and archiving (how would the transfer between them take place?)

d) 100% of the capacity would be with high access speed, operated by OMERO, so no separate working and archiving storages

The OMERO server configuration includes some mechanisms for extending the repository and accommodate for different underlying storage volumes [4]. Some people had also looked at integrating production archiving solutions like Arkivum with OMERO - see for instance [5].

Vanilla OMERO does not include support for hierarchical storage-- we have been asked this many times. You might consider having OMERO trigger data migration between different tiers or use some other applications. As each archiving implementation ends up having custom features, we’d encourage you to look at various integration points and consider what works best for your needs.

[4]  https://docs.openmicroscopy.org/omero/5.4.10/sysadmins/repository-move.html#extending-the-managed-repository
[5] https://downloads.openmicroscopy.org/presentations/2017/Users-Meeting/Lightning-Talks/Alex%20Herbert%20-%20Archiving%20images%20from%20OMERO%20to%20Arkivum%202017.pdf

Thanks a lot in advance for any comments,

Pasi

Best,
Sebastien & the OME/IDR team


--
Pasi Kankaanpää, PhD
Administrative Director, Turku BioImaging
Project Manager, Euro-BioImaging
Åbo Akademi University and University of Turku
Turku, Finland

pkankaan at abo.fi<mailto:pkankaan at abo.fi>

_______________________________________________
ome-users mailing list
ome-users at lists.openmicroscopy.org.uk
https://lists.openmicroscopy.org.uk/mailman/listinfo/ome-users


The University of Dundee is a registered Scottish Charity, No: SC015096
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-users/attachments/20190422/b9851b02/attachment.html>


More information about the ome-users mailing list