[ome-users] List expected files in omero data Files/ and omero scripts
Josh Moore
josh at glencoesoftware.com
Mon Sep 19 11:48:55 BST 2016
Tag,
On Wed, Sep 14, 2016 at 5:08 PM, Carnë Draug <carandraug+dev at gmail.com> wrote:
> On 14 September 2016 at 10:44, "Colin Blackburn"
...snip...
> But this still won't work. If a python script is an attachment to an image
> it will go into the originalfile table, have a 'text/x-python' mimetype'
> and a originalfile.path value, be stored in Files/, but would still be
> filtered out by that query. How can I distinguish between an old script
> whose repo value has been cleared after an omero upgrade (and therefore
> won't be in Files/) and a script file that was attached to an image?
I'm unsure of the use case "python script attached to an image" but
would be happy to hear more if you think this is worthwhile.
That being said, I think your concern can be simplified down to, "how
can I distinguish between an old script whose repo value has been
cleared and a user-uploaded script?" -- You can't. The official
scripts which are in the ScriptRepo have their repo removed since the
files have been overwritten. i.e. they no longer exist as files.
However, deleting them from the DB is not desired since certain
results will be linked to their execution. The point you raise is that
what we *should have done* is copy the script into OriginalFile, but
unfortunately, we did not.
When it comes to scripts, there will apparently need to be manual
intervention. Comparing who created the scripts (old official scripts
will be created as root) may help you limit the number of false
positives.
Chees,
~Josh.
On Wed, Sep 14, 2016 at 5:08 PM, Carnë Draug <carandraug+dev at gmail.com> wrote:
> On 14 September 2016 at 10:44, "Colin Blackburn"
> <C.Blackburn at dundee.ac.uk> wrote:
>> Hi Carnë,
>>
>> On 13/09/2016 18:05, "ome-users on behalf of Carnë Draug"
>> <ome-users-bounces at lists.openmicroscopy.org.uk on behalf of
>> carandraug+dev at gmail.com> wrote:
>>
>>>Hi
>>>
>>>I have been trying to get a list of all files that ought to be in the
>>>Files/ directory. This is a 5.2.5 installation of omero but has been
>>>through several upgrades so it still has files from omero 4 releases.
>>>
>>>At the moment, I'm using this query:
>>>
>>> SELECT
>>> id
>>> FROM originalfile
>>> WHERE (repo IS NULL OR repo = '') AND mimetype != 'Repository';
>>>
>>>but I found out that this includes some omero scripts (some of the rows
>>>from that query have a mimetype of 'text/x-python' and their filenames
>>>are the omero scripts including some of old scripts that no longer exist
>>>such as Movie_Figure.py and Make_Movie.py).
>>>
>>>I thought scripts were kept where they are and not copied. Indeed, new
>>>scripts have a repo value of 'ScriptRepo'. Is this because omero scripts
>>>used to be copied to Files/ and is my query is correct?
>>
>> Official scripts are at paths relative to the server installation, rather
>> than the data directory, so they do not have any entries under Files/
>> This applies to those scripts in the structured folders under
>> 'lib/scripts/omero/' on server startup and those uploaded officially (via
>> 'bin/omero script upload FooBar.py --official' say). In the latter case
>> the path on the OriginalFile table is likely to be '/'.
>>
>> When an official script is replaced, or modified in-place and the server
>> restarted, the old row is left in the database with the repo column
>> nulled. The mimetype is left as 'text/x-python' (or 'text/x-jython' or
>> 'text/x-matlab' if relevant). A new row is then created with the repo set
>> to 'ScriptRepo' and a new hash. (And with the addition of look-up tables
>> in 5.3 you will probably see files with 'ScriptRepo' and 'text/x-lut' also
>> appear, though this may be subject to change before release.)
>>
>> When a user script is uploaded, ie not an official one, this *is* stored
>> under Files/ Such scripts will not have the repo or path columns set in
>> their OriginalFile row.
>
> If I understood correctly, you are saying that the way to differentiate
> between a script that was uploaded (not official) and old official repos
> is an empty path. So instead of:
>
> SELECT
> id
> FROM originalfile
> WHERE (repo IS NULL OR repo = '') AND mimetype != 'Repository';
>
> I should be doing:
>
> SELECT
> id
> FROM originalfile
> WHERE (repo IS NULL OR repo = '') AND mimetype != 'Repository'
> AND ((mimetype != 'text/x-python' AND mimetype != 'text/x-jython'
> AND mimetype != 'text/x-matlab') OR path IS NULL OR path = '');
>
> to account for the special cases of old official scripts.
>
> But this still won't work. If a python script is an attachment to an image
> it will go into the originalfile table, have a 'text/x-python' mimetype'
> and a originalfile.path value, be stored in Files/, but would still be
> filtered out by that query. How can I distinguish between an old script
> whose repo value has been cleared after an omero upgrade (and therefore
> won't be in Files/) and a script file that was attached to an image?
>
>> Other OriginalFile entries that will have actual file entries under Files/
>> are uploaded files (using 'bin/omero upload' or the graphical clients) and
>> pre-OMERO5 archived files. In addition the server may create some other
>> files under Files/ (to capture stderr, for instance) that will then have
>> entries in OriginalFiles.
>>
>> I may have missed some potential candidates for Files/ but I'm sure Josh
>> will be along soon to add those if that is the case!
>
> But given an arbitrary omero database, how can I distinguish between rows
> in the originalfile table that should be in Files and rows that should be
> somewhere else? I'm facing the issue now with this old scripts but my
> problem really is getting a list of files that omero expects to be in Files/.
>
> I understand that many of the files end up in the originalfile table but
> that table doesn't have that information. It seems like the typical use
> case is coming from another table and sometime end up in originalfile.
> Should I be getting all the id from other tables instead?
>
>>>And is my query correct to get a list of all files expected to be under
>>>Files/ ? Is there something else to look for? What should the query be
>>>for an arbitrary omero database (the long plan is to write a tool to
>>>validate any omero database against an omero data directory -- the
>>>opposite
>>>of omero cleanse).
>>
>> Have you looked at investigating the problem the other way around, more
>> like cleanse? Collecting the candidate Ids from the file names under
>> Files/ and then looking at the rows in the table that don't match these
>> files?
>>
>
> No, because we are trying to work around an issue with that approach.
> Given an omero database, validate the omero data directory. If a file
> is missing there, for example, because of something like omero 2016-SV1 [1],
> how can we identify it?
>
> Carnë
>
> [1] http://www.openmicroscopy.org/site/products/omero/secvuln/2016-SV1-cleanse
More information about the ome-users
mailing list