[ome-users] OMERO and storing large amounts of data

Mon Apr 22 18:27:33 BST 2019

Thanks a lot Sebastien for the comments!

Cheers,

Pasi

On 22.04.2019 17:10, Sebastien Besson (Staff) wrote:
> Hi Pasi
>
>> On 17 Apr 2019, at 14:02, Pasi Kankaanpää <pkankaan at abo.fi 
>> <mailto:pkankaan at abo.fi>> wrote:
>>
>> Hi everyone,
>>
>> We at Turku BioImaging and the Finnish Euro-BioImaging Node are 
>> currently making plans for an OMERO server setup, and some questions 
>> have come up in this context. The OMERO team recommended posting 
>> these questions here, in case they would benefit also others. Any 
>> experiences, thought and comments would be greatly appreciated.
>
> Thanks for raising these questions publicly. Most of these answers are 
> informed by our own use case with IDR. Other community members with 
> large-scale deployments might want to share their experience as well.
>> 1) How well would you estimate OMERO handles large files, such as 
>> data from a light sheet microscope, in these two scenarios a) a large 
>> amount of data consisting of numerous small files and b) a large 
>> amount of data consisting of just one or few very large files.
>
> IDR has multiple representative examples of both situations:
> 1- the case of numerous small files is frequent in the High-Content 
> Screening domain where individual plates can typically span over 
> 10-100K files [1]
> 2- the case of 100GB-1TB sized individual files is becoming 
> increasingly common e.g. from the light-sheet technology experiments [2]
>
> Re “How well…?”, we won’t pretend that routinely loading multi-TB 
> datasets with many 1000s of files or very large TB-scale files does 
> not require some care and consideration. For example, copying the data 
> often is no longer an option and we use “in-place” import for these 
> dataset or we will delay thumbnail calculation until the server is 
> less busy-- these fall under what we refer to as “advanced import 
> mechanisms” [3]. Hopefully, the two previous links demonstrate that 
> both scenarios can be handled with appropriate use of OMERO’s 
> capabilities.
>
> [1] https://idr.openmicroscopy.org/webclient/?show=image-1898184
> [2] https://idr.openmicroscopy.org/webclient/?show=image-4007801
> [3] 
> https://docs.openmicroscopy.org/omero/5.4.10/sysadmins/import-scenarios.html
>> 2) If we think of a 5-10 year spectrum and a total cumulative data 
>> amount during this time of 10 petabytes, will OMERO handle this 
>> without problems?
>
> Large deployments like IDR are routinely serving 100TB-1PB datasets 
> these days. Considering the data growth, I would say the data volume 
> will naturally fall into your range over the next decade. Probably 
> even more than disk size, the choice of hardware and the access speed 
> to the storage/database are important architectural decisions.
>> 3) If we think of a scenario where of the total storage capacity 
>> 10-20% would be faster access working storage and the rest more 
>> permanent archiving storage (that can have slower access times), how 
>> would you envision OMERO working with this scenario, or what would 
>> you see as pros and cons of these alternatives:
>>
>> a) OMERO would run only in working storage and not archiving (how 
>> would the transfer between them take place?)
>>
>> b) OMERO would run only in archiving and not working storage (how 
>> would the transfer between them take place?)
>>
>> c) OMERO would run both in working storage and archiving (how would 
>> the transfer between them take place?)
>>
>> d) 100% of the capacity would be with high access speed, operated by 
>> OMERO, so no separate working and archiving storages
>
> The OMERO server configuration includes some mechanisms for extending 
> the repository and accommodate for different underlying storage 
> volumes [4]. Some people had also looked at integrating production 
> archiving solutions like Arkivum with OMERO - see for instance [5].
>
> Vanilla OMERO does not include support for hierarchical storage-- we 
> have been asked this many times. You might consider having OMERO 
> trigger data migration between different tiers or use some other 
> applications. As each archiving implementation ends up having custom 
> features, we’d encourage you to look at various integration points and 
> consider what works best for your needs.
>
> [4] 
> https://docs.openmicroscopy.org/omero/5.4.10/sysadmins/repository-move.html#extending-the-managed-repository
> [5] 
> https://downloads.openmicroscopy.org/presentations/2017/Users-Meeting/Lightning-Talks/Alex%20Herbert%20-%20Archiving%20images%20from%20OMERO%20to%20Arkivum%202017.pdf
>
>> Thanks a lot in advance for any comments,
>>
>> Pasi
>
> Best,
> Sebastien & the OME/IDR team
>
>>
>> --
>> Pasi Kankaanpää, PhD
>> Administrative Director, Turku BioImaging
>> Project Manager, Euro-BioImaging
>> Åbo Akademi University and University of Turku
>> Turku, Finland
>>
>> pkankaan at abo.fi <mailto:pkankaan at abo.fi>
>>
>> _______________________________________________
>> ome-users mailing list
>> ome-users at lists.openmicroscopy.org.uk
>> https://lists.openmicroscopy.org.uk/mailman/listinfo/ome-users
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-users/attachments/20190422/8e37fdd3/attachment.html>