[ome-users] OMERO and storing large amounts of data
Pasi Kankaanpää
pkankaan at abo.fi
Mon Apr 22 18:27:33 BST 2019
Thanks a lot Sebastien for the comments!
Cheers,
Pasi
On 22.04.2019 17:10, Sebastien Besson (Staff) wrote:
> Hi Pasi
>
>> On 17 Apr 2019, at 14:02, Pasi Kankaanpää <pkankaan at abo.fi
>> <mailto:pkankaan at abo.fi>> wrote:
>>
>> Hi everyone,
>>
>> We at Turku BioImaging and the Finnish Euro-BioImaging Node are
>> currently making plans for an OMERO server setup, and some questions
>> have come up in this context. The OMERO team recommended posting
>> these questions here, in case they would benefit also others. Any
>> experiences, thought and comments would be greatly appreciated.
>
> Thanks for raising these questions publicly. Most of these answers are
> informed by our own use case with IDR. Other community members with
> large-scale deployments might want to share their experience as well.
>> 1) How well would you estimate OMERO handles large files, such as
>> data from a light sheet microscope, in these two scenarios a) a large
>> amount of data consisting of numerous small files and b) a large
>> amount of data consisting of just one or few very large files.
>
> IDR has multiple representative examples of both situations:
> 1- the case of numerous small files is frequent in the High-Content
> Screening domain where individual plates can typically span over
> 10-100K files [1]
> 2- the case of 100GB-1TB sized individual files is becoming
> increasingly common e.g. from the light-sheet technology experiments [2]
>
> Re “How well…?”, we won’t pretend that routinely loading multi-TB
> datasets with many 1000s of files or very large TB-scale files does
> not require some care and consideration. For example, copying the data
> often is no longer an option and we use “in-place” import for these
> dataset or we will delay thumbnail calculation until the server is
> less busy-- these fall under what we refer to as “advanced import
> mechanisms” [3]. Hopefully, the two previous links demonstrate that
> both scenarios can be handled with appropriate use of OMERO’s
> capabilities.
>
> [1] https://idr.openmicroscopy.org/webclient/?show=image-1898184
> [2] https://idr.openmicroscopy.org/webclient/?show=image-4007801
> [3]
> https://docs.openmicroscopy.org/omero/5.4.10/sysadmins/import-scenarios.html
>> 2) If we think of a 5-10 year spectrum and a total cumulative data
>> amount during this time of 10 petabytes, will OMERO handle this
>> without problems?
>
> Large deployments like IDR are routinely serving 100TB-1PB datasets
> these days. Considering the data growth, I would say the data volume
> will naturally fall into your range over the next decade. Probably
> even more than disk size, the choice of hardware and the access speed
> to the storage/database are important architectural decisions.
>> 3) If we think of a scenario where of the total storage capacity
>> 10-20% would be faster access working storage and the rest more
>> permanent archiving storage (that can have slower access times), how
>> would you envision OMERO working with this scenario, or what would
>> you see as pros and cons of these alternatives:
>>
>> a) OMERO would run only in working storage and not archiving (how
>> would the transfer between them take place?)
>>
>> b) OMERO would run only in archiving and not working storage (how
>> would the transfer between them take place?)
>>
>> c) OMERO would run both in working storage and archiving (how would
>> the transfer between them take place?)
>>
>> d) 100% of the capacity would be with high access speed, operated by
>> OMERO, so no separate working and archiving storages
>
> The OMERO server configuration includes some mechanisms for extending
> the repository and accommodate for different underlying storage
> volumes [4]. Some people had also looked at integrating production
> archiving solutions like Arkivum with OMERO - see for instance [5].
>
> Vanilla OMERO does not include support for hierarchical storage-- we
> have been asked this many times. You might consider having OMERO
> trigger data migration between different tiers or use some other
> applications. As each archiving implementation ends up having custom
> features, we’d encourage you to look at various integration points and
> consider what works best for your needs.
>
> [4]
> https://docs.openmicroscopy.org/omero/5.4.10/sysadmins/repository-move.html#extending-the-managed-repository
> [5]
> https://downloads.openmicroscopy.org/presentations/2017/Users-Meeting/Lightning-Talks/Alex%20Herbert%20-%20Archiving%20images%20from%20OMERO%20to%20Arkivum%202017.pdf
>
>> Thanks a lot in advance for any comments,
>>
>> Pasi
>
> Best,
> Sebastien & the OME/IDR team
>
>>
>> --
>> Pasi Kankaanpää, PhD
>> Administrative Director, Turku BioImaging
>> Project Manager, Euro-BioImaging
>> Åbo Akademi University and University of Turku
>> Turku, Finland
>>
>> pkankaan at abo.fi <mailto:pkankaan at abo.fi>
>>
>> _______________________________________________
>> ome-users mailing list
>> ome-users at lists.openmicroscopy.org.uk
>> https://lists.openmicroscopy.org.uk/mailman/listinfo/ome-users
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-users/attachments/20190422/8e37fdd3/attachment.html>
More information about the ome-users
mailing list