[ome-devel] In-place Imports

Wed Dec 18 16:28:17 GMT 2013

Hi all,

Firstly, really excited about being able to deploy OMERO5 in production!
Nice one.

As I understand it, once we have OMERO5, we should be able to start working
towards having in-place imports of data. This is pretty important for a
number of reasons, not least of which is that all our users currently keep
copies of everything on a network drive as well as OMERO which is total
duplication. Also, it will drastically reduce import times and network
traffic. I'm interested in understanding how this is going to work.

I assume that this will require a special import procedure in which the
OMERO server is given a path to a file on the local machine instead of data
for upload.

Obviously there is no upload step per se although there might be some
symlinking or something, then the server would move on directly to doing
the metadata import.

I have some questions.

*Linkage*
How would the linkage within OMERO work? An obvious solution is to symlink
the file from the managed repository. I don't think that will work very
well because the user will inevitably want to get the path to their data
back out of OMERO in the future. If they are given an address in the
managed repository then this will mean little to them compared to the
original path of their image.

The idea would be that they get this path and can then navigate some
proprietary image analysis tool to that location on their network drive and
open it. Probably there would have to be config on the server to map the
local paths that OMERO users into a sensible path for the user. E.g.*
/mnt/fileserver1/dpwrussell/data/omero/foo/bar/myimage.dv* might be what
the OMERO server sees as the image path, but the user would want
*data/omero/foo/bar/myimage.dv.*

We would not want to make all the managed repository available to all users
because this would violate the privacy of all. I guess an alternative would
be to map sections of the managed repository to network user accounts, but
this would lose the niceness of being given a path they recognise from
their actual data hierarchy.

*Permissions*
Would OMERO deal with making the permissions (or maybe ownership) changes
to the files to reduce the risk of deletion/overwriting or should this be
done by whatever person/process is marking data for in-place import? I
think probably it would make more sense that the person/process do it
because they may have to make permissions/ownership changes to make
whatever user OMERO is running as be able to see the data at all.

*Deletion/Overwrite (Worst Case Scenario)*
What would happen if a user was to delete/overwrite some data somehow?

*Multiple Data Stores*
Given that we already have a repository of data, will we be able to add
multiple OMERO data directories? One would be all the data that's already
been uploaded and subsequent upload imports. The other would be (in the
solution I am envisaging) a mounted filesystem on another server which has
a directory per user (it's basically our current fileserver), this is where
in-place imports would be done from so the data doesn't have to move at all.

*Moving Files*
How about a mechanism for moving a file? I can imagine users having this
requirement. Obviously they'd have to go through some special process to
allow it to happen. What about moving a file from already uploaded data to
the users repository, I know for sure they are going to want this if the
above scenario would become a reality.

*Deduplication*
Finally, on a side-note, are there any plans to have a deduplication
process? Any file that was previously uploaded with the archive option
could presumably now have the pixels removed and perhaps be moved to the
managed repository at the same time?

Thanks all,

Douglas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20131218/d89f05de/attachment.html>