[ome-devel] TIFF pyramid support in Bio-Formats - reference files for review

Mon Mar 26 23:41:26 BST 2018

Hi Roger,

I'm having a bit of fun with creating type "C" pyramidal OME-Tiffs. The 
makepyramid-ptif script works for single-channel files but I expanded it 
a bit to allow creation of multi-channel ones as well. Enclosed is 
makepyramid-multi. That didn't make the script any prettier but as you 
said, this is only a hack to create some test data and better understand 
how best to structure the tiff and associated ome-xml. Using the multi 
script, I generated a 2-channel image with pyramids inside and I 
attached bogus "label" and "macro" images (had to hand-edit the XML to 
get them to meet OME-XML requirements). I uploaded that file to the QA 
system for your enjoyment.

>> - the Leica-Fluorescence-1.scn (and thus all its result files) has the
>> channels stored in a way that is not exactly conform how one would
>> normally store the channels in the OME-standard way. Am I correct in
>> thinking that those channels would normally be in separate top-level
>> IFDs each and have their sub-resolutions arranged in SubIFDs under each
>> of those top-levels?
>
> Both are permitted, even combined, though this way is certainly less
> typical for fluorescence.  For RGB data it's expected though. You're
> absolutely correct about each top-level IFD having separate 
> sub-resolutions.
One thing I noticed about the Leica-Fluorescence-1 type "C" ome-tiff 
result file, is that while it passes xmlvalid, showinf doesn't really 
like it. The file I uploaded appears to pass all tests and I even 
uploaded it to OMERO which of course doesn't even see the sub-resolution 
SubIFDs but has no issues with it.

If the team indeed decides to go with the "C" approach, this indicates 
that it should ??always?? be possible to create the pyramid OME-TIFF 
file in such a way that any old-style OME-TIFF reader that is compatible 
with the current standard, should be able to read the new-style files 
and simply ignore the subresolutions.
>
>> - also, all these examples carry some of those ancillary images such as
>> slide labels, overview images, etc. While the scripts handled those
>> correctly as far as I can see, a reader would have to be pretty smart to
>> figure out which of the contents of such an output file is the actual
>> image and which are those ancillary images.
>
> Absolutely.  We will need to do something about these as a followup
> task, to add the metadata to identify these as labels, overviews, etc.
>
In the file I uploaded the mocked up label and macro/overview images are 
stored as top-level IFDs and in the OME-XML coded as separate series. 
Would this be an appropriate way to store such ancillary images? If so, 
then we could codify where in the file they should be (at the end?), 
standardize what name they should have to identify them, and anything else?

Cheers,
- Damir
>
> Kind regards,
> Roger
>
> The University of Dundee is a registered Scottish Charity, No: SC015096

-- 
Damir Sudar - Affiliate Scientist
Lawrence Berkeley Natl Laboratory / MBIB
One Cyclotron Road, MS 977, Berkeley, CA 94720, USA
T: 510/486-5346 - F: 510/486-5586 - E: DSudar at lbl.gov
http://biosciences.lbl.gov/profiles/damir-sudar-2/

Visiting Scientist, Oregon Health & Science University

-------------- next part --------------
#!/bin/bash

set -e
set -x

# Check if file is BigTIFF
# $1 file
bigtiff() {
    if [ "0000000 4949 002b" != "$(dd if="$1" iflag=count_bytes count=4 | od -x | head -n1)" ]; then
        return 1
    fi
    return 0
}

# Get fieldoffset for TIFF
# $1 file
fieldoffset() {
    if bigtiff "$1"; then
        echo 8
    else
        echo 2
    fi
}

# Get fieldsize for TIFF
# $1 file
fieldsize() {
    if bigtiff "$1"; then
        echo 20
    else
        echo 12
    fi
}

# Get IFD offsets
# $1=IFD number
# $2=file
diroffsets() {
    tiffinfo "$dest" | grep "TIFF Directory at offset" | sed -e 's;.*(\(.*\))$;\1;'
}

# Get offset for IFD
# $1=IFD number
# $2=file
diroffset() {
    diroffsets "$2" | head -n$(($1 + 1)) | tail -n1
}

# Get number of tags in directory
# $1=IFD offset
# $2=file
ntags() {
    echo "od -j $1 -N 2 -d \"$2\"" >&2
  od -j $1 -N 2 -d "$2" | head -n1 | sed -e 's;^[0-9]* *\(.*\);\1;'
}

# Offset of next pointer in IFD
# $1=IFD offset
# $2=file
nextoffset() {
    echo "$(($1 + $(fieldoffset "$2") + ($(ntags $1 "$2") * $(fieldsize "$2"))))"
}

# Write uint64 little endian value to binary file
# $1=value
# $2=destination file
# $3=offset in file
update_uint64_le() {
    if bigtiff "$2"; then
        printf "$(printf %.16x $1 | sed -e 's;\(..\)\(..\)\(..\)\(..\)\(..\)\(..\)\(..\)\(..\);\8\7\6\5\4\3\2\1;' | sed -e 's;\(..\);\\x\1;g')" | dd of="$2" conv=notrunc,nocreat oflag=seek_bytes seek=$3
    else
        printf "$(printf %.8x $1 | sed -e 's;\(..\)\(..\)\(..\)\(..\);\4\3\2\1;' | sed -e 's;\(..\);\\x\1;g')" | dd of="$2" conv=notrunc,nocreat oflag=seek_bytes seek=$3
    fi
}

makeuuid() {
    echo "urn:uuid:$(uuidgen)"
}

flist=''
planelist=''
cnt=0
chan_cnt=0
for arg in "$@"; do
    if [ "$flist" == "" ]; then
        firstname=$arg
    fi
    # TODO: check whether valid ptif file and extract C, Z, T, coding per Bio-Formats pattern strategy
    # and re-order as necessary. For now, it assumes an ordered collection of single-channel tiffs or ptifs
    is_chan=$(echo "$arg" | grep "_C" | wc -l)
    if [ "$is_chan" == "1" ]; then
        planelist=$(echo "$planelist" "$chan_cnt" "$cnt")
        let chan_cnt+=1
    fi
    let cnt+=1
    flist=$(echo $flist "$arg")
    # set the first IFD in each ptif/tif to be the full-res image
    tiffset -d 0 -s 254 0 "$arg"
done

base="$(basename $firstname)"
dest="${base%.ptif}.ome.tiff"
tiffcp $flist "$dest"

# unnecessary indent
    nifds=$(tiffinfo "$dest" | grep "TIFF Directory at offset" | wc -l)
    echo "IFD count: $nifds"

    ifds=$(seq 0 $(($nifds - 1)))
    mainifds=''
    subifds=''
    for ifd in $ifds; do
        is_main=$(tiffinfo -"$ifd" "$dest" | grep "Subfile Type: (0 = 0x0)" | wc -l)
        if [ "$is_main" == "1" ]; then
            mainifds=$(echo "$mainifds" "$ifd")
        else
            subifds=$(echo "$subifds" "$ifd")
        fi
    done
    nmainifds=$(echo "$mainifds" | wc -w)
    nsubifds=$(echo "$subifds" | wc -w)
    echo "$nifds - IFDs: $ifds"
    echo "$nmainifds - Main IFDs: $mainifds"
    echo "$nsubifds - SUBIFDs: $subifds"

    # Main header
    tiffset -d 0 -s 270 "OME Pyramid TIFF test (from $base)" "$dest"
    tiffset -d 0 -s 305 "A gnarly shell script (makepyramid-multi)" "$dest"
    tiffset -d 0 -s 315 "Roger Leigh <rleigh at dundee.ac.uk>" "$dest"

    # NewSubFileType
    for ifd in $mainifds; do
        # superfluous but keeping it for now
        tiffset -d $ifd -s 254 0 "$dest"
    done
    for ifd in $subifds; do
        tiffset -d $ifd -s 254 1 "$dest"
    done

    # SubIFDs
    n=1
    for mainifd in $mainifds; do
        let n+=1
        nextmainifd=$(echo $mainifds | cut -d" " -f$n)
        let nn=$mainifd+2
	if [ -z $nextmainifd ] || [ $nn -gt $nextmainifd ]; then
            continue
        fi
        subifdslist=$(echo $ifds | cut -d" " -f$(($mainifd + 2))-$nextmainifd)
        subifds=$(echo $subifdslist | tr " " "\n")
        nsubifds=$(echo "$subifds" | wc -l)
        subifds_start=$(echo "$subifds" | head -n1)
        subifds_end=$(echo "$subifds" | tail -n1)
        subifds_diroffs=$(echo $(tiffinfo "$dest" | grep "TIFF Directory at offset" | sed -e 's;.*(\(.*\))$;\1;' | head -n$(($subifds_end+1)) | tail -n$nsubifds))
        echo "SubIFDs for series $mainifd: $subifds_diroffs"
        tiffset -d $mainifd -s 330 $nsubifds $subifds_diroffs "$dest"
        subifds_diroffs=$(echo $(tiffinfo "$dest" | grep "TIFF Directory at offset" | sed -e 's;.*(\(.*\))$;\1;' | head -n$(($subifds_end+1)) | tail -n$nsubifds))
        echo "Updated SubIFDs for series $mainifd: $subifds_diroffs"
    done

    echo "New directories:"
    diroffsets "$dest"

    # Relink IFDs to elide SubIFDs.  Run backward to keep the file readable between iterations.
    nextdiroff=0
    nextmainifd=''
    for mainifd in $(echo "$mainifds" | tr " " "\n" | tac); do
        let nn=$mainifd+2
        if [ -z $nextmainifd ] || [ $nn -gt $nextmainifd ]; then
            nextmainifd=$mainifd
        else
            subifdslist=$(echo $ifds | cut -d" " -f$(($mainifd + 2))-$nextmainifd)
            nextmainifd=$mainifd
            subifds=$(echo $subifdslist | tr " " "\n")
            nsubifds=$(echo "$subifds" | wc -l)
            subifds_start=$(echo "$subifds" | head -n1)
            subifds_end=$(echo "$subifds" | tail -n1)
            subifds_diroffs=$(echo $(tiffinfo "$dest" | grep "TIFF Directory at offset" | sed -e 's;.*(\(.*\))$;\1;' | head -n$(($subifds_end+1)) | tail -n$nsubifds))

            for offset in $subifds_diroffs; do
                noffset="$(nextoffset $offset "$dest")"
                update_uint64_le 0 "$dest" $noffset
            done
        fi

        maindir="$(diroffset $mainifd "$dest")"
        noffset="$(nextoffset $maindir "$dest")"

        update_uint64_le $nextdiroff "$dest" $noffset

        nextdiroff="$(diroffset $mainifd "$dest")"
    done

    tiffinfo "$dest"

    # Create OME-XML metadata for the file
    bfomexml="$(showinf -nopix -noflat -omexml -omexml-only "$dest")"

    # Add TiffData elements.
    uuid="$(makeuuid)"
    ome_attr="Creator=\"makepyramid-multi\" UUID=\"${uuid}\""

    tiffdata_fmt="<TiffData FirstC=\"%d\" FirstT=\"0\" FirstZ=\"0\" IFD=\"%d\" PlaneCount=\"1\"><UUID FileName=\"$(basename "${dest}")\">${uuid}</UUID></TiffData>"

    tiffdata_blk="$(printf "$tiffdata_fmt" $planelist)"

    omexml_fmt="$(echo "$bfomexml" | sed -e "s;\(<OME.*\)\(\">\);\1\" ${ome_attr}>;" -e "s;\(SizeC=\"\)[0-9]*\";\1${chan_cnt}\";" -e "s;\(SizeT=\)\"[0-9]*\";\1\"1\";" -e "s;<MetadataOnly\/>;${tiffdata_blk};")"

    omexml="$(printf "$omexml_fmt")"

    tiffset -d 0 -s 270 "$omexml" "$dest"