[ome-devel] Bio-Formats build system and repository structure [was: Re: Error building Bio-Formats develop]

Curtis Rueden ctrueden at wisc.edu
Fri Jun 14 22:52:08 BST 2013


Hi Melissa & everyone,

Thank you for replying to my suggestion. I appreciate the discussion.

> Note that we've previously (and relatively recently) put quite a bit
> of effort into getting the OME-XML code and specification into
> bioformats.git.

Yes, you might recall that I am the one who did most of that work. ;-) [1]

And I would be willing to do it again, if it meant a cleaner build system.
(Though splitting subtrees is much less complicated.)

> If we were to follow this proposal, and unless I misunderstand, then
> making one simple change in the OME-XML schema (e.g. changing a single
> field type) would require:

Well, I think specification and ome-xml should live in the same repository.

So it would require:
- filing a PR into the specification repo (ome-xml.git in my proposal)
- cutting a new ome-xml release artifact
- filing a PR into bioformats.git to update the version of ome-xml

So, two PRs instead of one. (As Mark said, scifio.git doesn't depend on
ome-xml.) But the schema and ome-xml code doesn't change nearly as often as
Bio-Formats does, of course.

Still, I can appreciate how two PRs might be undesirable (believe me; you
know how much I hate the develop/dev_4_4 split :-). To avoid that, you
could keep specification & ome-xml & bio-formats in the same git repository
as they are now, while still using separate versioning per component. This
would provide on target for PRs, while making it much simpler for most
people to build Bio-Formats thanks to remote release dependency resolution
of Maven/Ivy/etc.

> Our policy so far has been to release everything at once; I think it
> would make more sense to agree first whether that should be changed,
> and then consider solutions.  I personally do favor having everything
> be released at once, as it makes keeping track of version numbers
> (mentally and when supporting users) much easier.

Indeed, my proposal hinges on the idea that everything should actually not
be released at once -- that many of these are separate projects, which are
A) useful in their own right, outside an OMERO-specific context; and B)
developed at a much different pace than one another, resulting in
gratuitous/unnecessary/confusing releases when all continue to be versioned
together. [2]

> I understand the desire for smarter autogeneration, but I think that
> would be much better accomplished within our existing build systems,
> rather than fragmenting the codebase.

I do not see it as one codebase, but rather several projects which all
currently happen to be lumped into one repository with a single release
schedule imposed upon them. Right now, all of these project releases are
dictated by the OMERO release schedule [3]. The OME team is ramping up for
the OMERO 5 release, and presumably a Bio-Formats 5 release will go along
with that. But what is really so radically new in Bio-Formats 5? Nothing, I
would argue. Now, e.g., an extensible SCIFIO-based Bio-Formats would
justify a 5.0.0 release. But instead, I am sad to see Bio-Formats versions
that have little semantic meaning [4] with respect to Bio-Formats itself,
all because of the simultaneous top-down release schedule driven by the
OMERO project.

Instead, my proposal emphasizes the individual projects as useful to the
community and the world in their own right. For example, MDB Tools Java is
not available anywhere else to my knowledge [5], since we rescued it from
an obscure forum post. Wouldn't it be a wonderful service to the community
to post it to its own Git repository on GitHub, so that it is easy to find,
and the greater community (beyond just those interested in OME) might
contribute back?

I understand I am proposing a substantial change. It is certainly about
more than just improving the code generation process. That's why I changed
the subject. And I understand the reluctance to change anything that might
disrupt current development processes. But in this case, I think the change
would be well worth it to make the OME projects into even better members of
the global FOSS community.

Regards,
Curtis

[1] http://trac.openmicroscopy.org.uk/ome/ticket/10435#comment:8

[2] One quick example of how the gratuitous releases cause problems:
whenever I update Bio-Formats using the ImageJ updater, it notices that
several of the JARs actually have no functional changes, and does not alter
the version number of the JAR files. Yes, we could potentially fix this in
the updater, but wouldn't it make more sense to simply not make vacuous
releases in the first place?

[3] Note that we are not even that consistent, since there are several
other projects we develop like native-lib-loader which are *not*
synchronized with the OMERO/Bio-Formats version. This works just fine,
since we treat those projects as external dependencies. The same thing
would work fine for the components/forks, components/stubs and ome-xml.

[4] ImageJ2 and SCIFIO are now using SemVer, which conveys useful
information in the version number: http://semver.org/

[5] Well, potentially this: http://jackcess.sourceforge.net/


On Tue, Jun 11, 2013 at 12:52 PM, Melissa Linkert <
melissa at glencoesoftware.com> wrote:

> Hi Curtis,
>
> > That said, I think Bio-Formats would greatly benefit from substantial
> > modularization of components. We are realizing this with SCIFIO, and I
> > think it applies to the OME-XML component as well.
> >
> > Below, I will lay out what I think is a better structure for the build
> > system, which would result in more advantages and less pain than with the
> > current structure.
> ...
> > MetadataStore, MetadataRetrieve, etc., would move to the ome-xml
> component,
> > keeping all generated code together.
> >
> > One Git repository for each of:
> >
> > - SCIFIO (https://github.com/scifio/scifio)
> > - OME-XML (https://github.com/openmicroscopy/ome-xml)
> > - Bio-Formats (https://github.com/openmicroscopy/bioformats)
> > - Fork: Apache POI (https://github.com/openmicroscopy/ome-poi)
> >  -- change package prefix to avoid third party code collisions
> > - Fork: MDB Tools Java (https://github.com/openmicroscopy/ome-mdb-tools)
> >  -- change package prefix to avoid third party code collisions
> > - Fork: JAI Image I/O (https://github.com/scifio/scifio-jai-image-io)
> >  -- change package prefix to avoid third party code collisions
> > - Stub: LWF (https://github.com/scifio/lwf-stubs)
>
> What you and Mark do for your work on SCIFIO is up to you, but I would
> be extremely hesitant to do something like this for Bio-Formats itself.
> Spreading one codebase across 7 different repositories is at best
> invasive, and would have a substantial impact upon anyone who routinely
> works on Bio-Formats.
>
> > In other words, OME-XML gets its own Git repository, which includes all
> the
> > code generated code. Each fork and stub also has its own repository in
> the
> > relevant namespace.
>
> Note that we've previously (and relatively recently) put quite a bit of
> effort into getting the OME-XML code and specification into bioformats.git.
>
> > Dependencies between repositories would be done by release version
> > coupling. For Maven projects (i.e., SCIFIO), simply making releases and
> > using release dependencies would be sufficient to facilitate repeatable
> > builds. For Ant-based projects (i.e., stuff in openmicroscopy namespace),
> > release JARs would continue to be committed to the repository as they are
> > now, or they could be resolved remotely via Ivy or similar.
>
> I don't agree that doing that makes things easier.  If we were to follow
> this proposal, and unless I misunderstand, then making one simple change in
> the OME-XML schema (e.g. changing a single field type) would require:
>
>   - a pull request into whichever repository houses the specification
>     (currently bioformats.git)
>   - creation of "release" artifacts from whichever repository houses the
>     specification
>   - a pull request into ome-xml.git (to update the autogenerated code)
>   - creation of release JARs from ome-xml.git
>   - a pull request into scifio.git (to update SCIFIO readers)
>   - creation of release JARs from scifio.git
>   - a pull request into bioformats.git (to update Bio-Formats readers)
>
> ...instead of what we have now, which is a single pull request into
> bioformats.git.
>
> > This would making building Bio-Formats much simpler and faster. As Roger
> > pointed out, we do not really need to code generate the OME-XML stuff on
> > every build, but rather only when the schema changes. Of course, the
> > OME-XML component contains other code which would be subject to change
> > between schema releases, but that's fine.
>
> I understand the desire for smarter autogeneration, but I think that
> would be much better accomplished within our existing build systems,
> rather than fragmenting the codebase.
>
> > This more modular structure would also facilitate these components being
> > developed on separate release cycles. The forks and stubs rarely change
> and
> > do not need to be released with every OME release. And the OME-XML
> project
> > could be released along side schema changes (i.e., twice a year) rather
> > than with every OME release.
>
> Our policy so far has been to release everything at once; I think it
> would make more sense to agree first whether that should be changed, and
> then consider solutions.  I personally do favor having everything be
> released at once, as it makes keeping track of version numbers (mentally
> and when supporting users) much easier.
>
> Again, what you do with respect to http://github.com/scifio/scifio is up
> to you.  Doing this for Bio-Formats itself would have a non-trivial impact
> on every single OME team member and a large portion of the developer
> community,
> and as such I think it would be better to consider other options for
> making autogeneration easier.
>
> Regards,
> -Melissa
>
> On Mon, Jun 10, 2013 at 10:50:25AM -0500, Curtis Rueden wrote:
> > Hi Roger & everyone,
> >
> > Sorry for the delay in reply. After spending the last couple of weeks on
> > ImageJ build issues related to native code components (specifically, the
> > ImageJ launcher in C), I have some new perspective on the new code
> > generation of the Bio-Formats build system.
> >
> > First of all, I want to say thanks to Roger for solving the build for
> both
> > Ant and Maven. I know maintaining the dual build systems can be
> substantial
> > extra work. But I think the Maven system has many advantages, so I am
> happy
> > it is being maintained.
> >
> > That said, I think Bio-Formats would greatly benefit from substantial
> > modularization of components. We are realizing this with SCIFIO, and I
> > think it applies to the OME-XML component as well.
> >
> > Below, I will lay out what I think is a better structure for the build
> > system, which would result in more advantages and less pain than with the
> > current structure.
> >
> > > One thing which might be an issue is that while xsd-fu generates the
> > > ome-xml model code, which could potentially be downloaded, it also
> > > generates all the MetadataStore, MetadateRetrieve and all the other
> > > Metadata-related classes in scifio, including OMEXMLMetadataImpl.
> > > Given that these are paired with the generated model code, generating
> > > one and downloading the other may result in breakage on model changes,
> > > or changes in xsd-fu or the templates which change the generated code.
> >
> > MetadataStore, MetadataRetrieve, etc., would move to the ome-xml
> component,
> > keeping all generated code together.
> >
> > One Git repository for each of:
> >
> > - SCIFIO (https://github.com/scifio/scifio)
> > - OME-XML (https://github.com/openmicroscopy/ome-xml)
> > - Bio-Formats (https://github.com/openmicroscopy/bioformats)
> > - Fork: Apache POI (https://github.com/openmicroscopy/ome-poi)
> >  -- change package prefix to avoid third party code collisions
> > - Fork: MDB Tools Java (https://github.com/openmicroscopy/ome-mdb-tools)
> >  -- change package prefix to avoid third party code collisions
> > - Fork: JAI Image I/O (https://github.com/scifio/scifio-jai-image-io)
> >  -- change package prefix to avoid third party code collisions
> > - Stub: LWF (https://github.com/scifio/lwf-stubs)
> >
> > In other words, OME-XML gets its own Git repository, which includes all
> the
> > code generated code. Each fork and stub also has its own repository in
> the
> > relevant namespace.
> >
> > Dependencies between repositories would be done by release version
> > coupling. For Maven projects (i.e., SCIFIO), simply making releases and
> > using release dependencies would be sufficient to facilitate repeatable
> > builds. For Ant-based projects (i.e., stuff in openmicroscopy namespace),
> > release JARs would continue to be committed to the repository as they are
> > now, or they could be resolved remotely via Ivy or similar.
> >
> > This would making building Bio-Formats much simpler and faster. As Roger
> > pointed out, we do not really need to code generate the OME-XML stuff on
> > every build, but rather only when the schema changes. Of course, the
> > OME-XML component contains other code which would be subject to change
> > between schema releases, but that's fine.
> >
> > This more modular structure would also facilitate these components being
> > developed on separate release cycles. The forks and stubs rarely change
> and
> > do not need to be released with every OME release. And the OME-XML
> project
> > could be released along side schema changes (i.e., twice a year) rather
> > than with every OME release.
> >
> > Comments welcome.
> >
> > Regards,
> > Curtis
> >
> >
> > On Thu, May 2, 2013 at 12:25 PM, Roger Leigh <r.leigh at dundee.ac.uk>
> wrote:
> >
> > > On 02/05/2013 16:52, Curtis Rueden wrote:
> > >
> > >   > If so, the build is completely identical--the sources which get
> > >>  > generated on the fly from the templates by xsd-fu are identical
> bar a
> > >>  > few lines comments in the top  boilerplate.
> > >>
> > >> OK, good to know.
> > >>
> > >> One more question/concern: presumably, the Bio-Formats build no longer
> > >> functions on Windows, due to the Python + Genshi dependency. With the
> > >> Ant build, this might be non-trivial to solve. But solving the issue
> > >> with Maven is very straightforward: include the "ome-xml" module in
> the
> > >> reactor only within a profile. Then, when that profile is not enabled,
> > >> Maven will resolve the ome-xml dependency from the remote repository
> > >> rather than regenerating and rebuilding the code. This would eliminate
> > >> the need to install Genshi, and make it easier to build on Windows
> > >> again. What do you think?
> > >>
> > >
> > > I'm afraid I'm no authority on Maven, so I'm not sure.  Maybe Melissa
> or
> > > Josh have a better take on this than me.  I assume that this will work
> > > correctly on Windows if python is installed?
> > >
> > > One thing which might be an issue is that while xsd-fu generates the
> > > ome-xml model code, which could potentially be downloaded, it also
> > > generates all the MetadataStore, MetadateRetrieve and all the other
> > > Metadata-related classes in scifio, including OMEXMLMetadataImpl.
>  Given
> > > that these are paired with the generated model code, generating one and
> > > downloading the other may result in breakage on model changes,
> > > or changes in xsd-fu or the templates which change the generated code.
> > >
> > > While it's not all enabled yet, I'd like to have the model selectable
> as
> > > an ant properly (it's xsdfu.schemaver), so that it's possible to change
> > > to a different model when building.  There's currently some hardcoded
> > > "2012-06" versions which need to be switched to change to use the
> > > property value.
> > >
> > >
> > > Regards,
> > > Roger
> > >
> > > --
> > > Dr Roger Leigh -- Open Microscopy Environment
> > > Wellcome Trust Centre for Gene Regulation and Expression,
> > > College of Life Sciences, University of Dundee, Dow Street,
> > > Dundee DD1 5EH Scotland UK   Tel: (01382) 386364
> > >
> > >
> > > The University of Dundee is a registered Scottish Charity, No: SC015096
> > >
> > > ______________________________**_________________
> > > ome-devel mailing list
> > > ome-devel at lists.**openmicroscopy.org.uk<
> ome-devel at lists.openmicroscopy.org.uk>
> > > http://lists.openmicroscopy.**org.uk/mailman/listinfo/ome-**devel<
> http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel>
> > >
>
> > _______________________________________________
> > ome-devel mailing list
> > ome-devel at lists.openmicroscopy.org.uk
> > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openmicroscopy.org.uk/pipermail/ome-devel/attachments/20130614/1bbf6e08/attachment-0001.html>


More information about the ome-devel mailing list