[ome-devel] Bio-Formats build system and repository structure [was: Re: Error building Bio-Formats develop]

Thu Jul 4 18:06:45 BST 2013

Hi Curtis,

> I do not see it as one codebase, but rather several projects which all
> currently happen to be lumped into one repository with a single release
> schedule imposed upon them. Right now, all of these project releases are
> dictated by the OMERO release schedule [3]. The OME team is ramping up for
> the OMERO 5 release, and presumably a Bio-Formats 5 release will go along
> with that. But what is really so radically new in Bio-Formats 5? Nothing, I
> would argue. Now, e.g., an extensible SCIFIO-based Bio-Formats would
> justify a 5.0.0 release. But instead, I am sad to see Bio-Formats versions
> that have little semantic meaning [4] with respect to Bio-Formats itself,
> all because of the simultaneous top-down release schedule driven by the
> OMERO project.
> 
> Instead, my proposal emphasizes the individual projects as useful to the
> community and the world in their own right. For example, MDB Tools Java is
> not available anywhere else to my knowledge [5], since we rescued it from
> an obscure forum post. Wouldn't it be a wonderful service to the community
> to post it to its own Git repository on GitHub, so that it is easy to find,
> and the greater community (beyond just those interested in OME) might
> contribute back?
> 
> I understand I am proposing a substantial change. It is certainly about
> more than just improving the code generation process. That's why I changed
> the subject. And I understand the reluctance to change anything that might
> disrupt current development processes. But in this case, I think the change
> would be well worth it to make the OME projects into even better members of
> the global FOSS community.

I can see the value in doing this for some of the components (such as
components/forks/*), but doing this for every single component would have
a negative impact on the workflow of the OME team without a clear benefit.

I have opened a story to investigate general build system and versioning
changes and potentially adding e.g. mdbtools.git to github.com/openmicroscopy:

http://trac.openmicroscopy.org.uk/ome/ticket/11228

All of the tasks under that story are fairly significant undertakings
though, and as such I have not yet assigned a milestone.

Regards,
-Melissa

On Fri, Jun 14, 2013 at 04:52:08PM -0500, Curtis Rueden wrote:
> Hi Melissa & everyone,
> 
> Thank you for replying to my suggestion. I appreciate the discussion.
> 
> > Note that we've previously (and relatively recently) put quite a bit
> > of effort into getting the OME-XML code and specification into
> > bioformats.git.
> 
> Yes, you might recall that I am the one who did most of that work. ;-) [1]
> 
> And I would be willing to do it again, if it meant a cleaner build system.
> (Though splitting subtrees is much less complicated.)
> 
> > If we were to follow this proposal, and unless I misunderstand, then
> > making one simple change in the OME-XML schema (e.g. changing a single
> > field type) would require:
> 
> Well, I think specification and ome-xml should live in the same repository.
> 
> So it would require:
> - filing a PR into the specification repo (ome-xml.git in my proposal)
> - cutting a new ome-xml release artifact
> - filing a PR into bioformats.git to update the version of ome-xml
> 
> So, two PRs instead of one. (As Mark said, scifio.git doesn't depend on
> ome-xml.) But the schema and ome-xml code doesn't change nearly as often as
> Bio-Formats does, of course.
> 
> Still, I can appreciate how two PRs might be undesirable (believe me; you
> know how much I hate the develop/dev_4_4 split :-). To avoid that, you
> could keep specification & ome-xml & bio-formats in the same git repository
> as they are now, while still using separate versioning per component. This
> would provide on target for PRs, while making it much simpler for most
> people to build Bio-Formats thanks to remote release dependency resolution
> of Maven/Ivy/etc.
> 
> > Our policy so far has been to release everything at once; I think it
> > would make more sense to agree first whether that should be changed,
> > and then consider solutions.  I personally do favor having everything
> > be released at once, as it makes keeping track of version numbers
> > (mentally and when supporting users) much easier.
> 
> Indeed, my proposal hinges on the idea that everything should actually not
> be released at once -- that many of these are separate projects, which are
> A) useful in their own right, outside an OMERO-specific context; and B)
> developed at a much different pace than one another, resulting in
> gratuitous/unnecessary/confusing releases when all continue to be versioned
> together. [2]
> 
> > I understand the desire for smarter autogeneration, but I think that
> > would be much better accomplished within our existing build systems,
> > rather than fragmenting the codebase.
> 
> I do not see it as one codebase, but rather several projects which all
> currently happen to be lumped into one repository with a single release
> schedule imposed upon them. Right now, all of these project releases are
> dictated by the OMERO release schedule [3]. The OME team is ramping up for
> the OMERO 5 release, and presumably a Bio-Formats 5 release will go along
> with that. But what is really so radically new in Bio-Formats 5? Nothing, I
> would argue. Now, e.g., an extensible SCIFIO-based Bio-Formats would
> justify a 5.0.0 release. But instead, I am sad to see Bio-Formats versions
> that have little semantic meaning [4] with respect to Bio-Formats itself,
> all because of the simultaneous top-down release schedule driven by the
> OMERO project.
> 
> Instead, my proposal emphasizes the individual projects as useful to the
> community and the world in their own right. For example, MDB Tools Java is
> not available anywhere else to my knowledge [5], since we rescued it from
> an obscure forum post. Wouldn't it be a wonderful service to the community
> to post it to its own Git repository on GitHub, so that it is easy to find,
> and the greater community (beyond just those interested in OME) might
> contribute back?
> 
> I understand I am proposing a substantial change. It is certainly about
> more than just improving the code generation process. That's why I changed
> the subject. And I understand the reluctance to change anything that might
> disrupt current development processes. But in this case, I think the change
> would be well worth it to make the OME projects into even better members of
> the global FOSS community.
> 
> Regards,
> Curtis
> 
> [1] http://trac.openmicroscopy.org.uk/ome/ticket/10435#comment:8
> 
> [2] One quick example of how the gratuitous releases cause problems:
> whenever I update Bio-Formats using the ImageJ updater, it notices that
> several of the JARs actually have no functional changes, and does not alter
> the version number of the JAR files. Yes, we could potentially fix this in
> the updater, but wouldn't it make more sense to simply not make vacuous
> releases in the first place?
> 
> [3] Note that we are not even that consistent, since there are several
> other projects we develop like native-lib-loader which are *not*
> synchronized with the OMERO/Bio-Formats version. This works just fine,
> since we treat those projects as external dependencies. The same thing
> would work fine for the components/forks, components/stubs and ome-xml.
> 
> [4] ImageJ2 and SCIFIO are now using SemVer, which conveys useful
> information in the version number: http://semver.org/
> 
> [5] Well, potentially this: http://jackcess.sourceforge.net/
> 
> 
> On Tue, Jun 11, 2013 at 12:52 PM, Melissa Linkert <
> melissa at glencoesoftware.com> wrote:
> 
> > Hi Curtis,
> >
> > > That said, I think Bio-Formats would greatly benefit from substantial
> > > modularization of components. We are realizing this with SCIFIO, and I
> > > think it applies to the OME-XML component as well.
> > >
> > > Below, I will lay out what I think is a better structure for the build
> > > system, which would result in more advantages and less pain than with the
> > > current structure.
> > ...
> > > MetadataStore, MetadataRetrieve, etc., would move to the ome-xml
> > component,
> > > keeping all generated code together.
> > >
> > > One Git repository for each of:
> > >
> > > - SCIFIO (https://github.com/scifio/scifio)
> > > - OME-XML (https://github.com/openmicroscopy/ome-xml)
> > > - Bio-Formats (https://github.com/openmicroscopy/bioformats)
> > > - Fork: Apache POI (https://github.com/openmicroscopy/ome-poi)
> > >  -- change package prefix to avoid third party code collisions
> > > - Fork: MDB Tools Java (https://github.com/openmicroscopy/ome-mdb-tools)
> > >  -- change package prefix to avoid third party code collisions
> > > - Fork: JAI Image I/O (https://github.com/scifio/scifio-jai-image-io)
> > >  -- change package prefix to avoid third party code collisions
> > > - Stub: LWF (https://github.com/scifio/lwf-stubs)
> >
> > What you and Mark do for your work on SCIFIO is up to you, but I would
> > be extremely hesitant to do something like this for Bio-Formats itself.
> > Spreading one codebase across 7 different repositories is at best
> > invasive, and would have a substantial impact upon anyone who routinely
> > works on Bio-Formats.
> >
> > > In other words, OME-XML gets its own Git repository, which includes all
> > the
> > > code generated code. Each fork and stub also has its own repository in
> > the
> > > relevant namespace.
> >
> > Note that we've previously (and relatively recently) put quite a bit of
> > effort into getting the OME-XML code and specification into bioformats.git.
> >
> > > Dependencies between repositories would be done by release version
> > > coupling. For Maven projects (i.e., SCIFIO), simply making releases and
> > > using release dependencies would be sufficient to facilitate repeatable
> > > builds. For Ant-based projects (i.e., stuff in openmicroscopy namespace),
> > > release JARs would continue to be committed to the repository as they are
> > > now, or they could be resolved remotely via Ivy or similar.
> >
> > I don't agree that doing that makes things easier.  If we were to follow
> > this proposal, and unless I misunderstand, then making one simple change in
> > the OME-XML schema (e.g. changing a single field type) would require:
> >
> >   - a pull request into whichever repository houses the specification
> >     (currently bioformats.git)
> >   - creation of "release" artifacts from whichever repository houses the
> >     specification
> >   - a pull request into ome-xml.git (to update the autogenerated code)
> >   - creation of release JARs from ome-xml.git
> >   - a pull request into scifio.git (to update SCIFIO readers)
> >   - creation of release JARs from scifio.git
> >   - a pull request into bioformats.git (to update Bio-Formats readers)
> >
> > ...instead of what we have now, which is a single pull request into
> > bioformats.git.
> >
> > > This would making building Bio-Formats much simpler and faster. As Roger
> > > pointed out, we do not really need to code generate the OME-XML stuff on
> > > every build, but rather only when the schema changes. Of course, the
> > > OME-XML component contains other code which would be subject to change
> > > between schema releases, but that's fine.
> >
> > I understand the desire for smarter autogeneration, but I think that
> > would be much better accomplished within our existing build systems,
> > rather than fragmenting the codebase.
> >
> > > This more modular structure would also facilitate these components being
> > > developed on separate release cycles. The forks and stubs rarely change
> > and
> > > do not need to be released with every OME release. And the OME-XML
> > project
> > > could be released along side schema changes (i.e., twice a year) rather
> > > than with every OME release.
> >
> > Our policy so far has been to release everything at once; I think it
> > would make more sense to agree first whether that should be changed, and
> > then consider solutions.  I personally do favor having everything be
> > released at once, as it makes keeping track of version numbers (mentally
> > and when supporting users) much easier.
> >
> > Again, what you do with respect to http://github.com/scifio/scifio is up
> > to you.  Doing this for Bio-Formats itself would have a non-trivial impact
> > on every single OME team member and a large portion of the developer
> > community,
> > and as such I think it would be better to consider other options for
> > making autogeneration easier.
> >
> > Regards,
> > -Melissa
> >
> > On Mon, Jun 10, 2013 at 10:50:25AM -0500, Curtis Rueden wrote:
> > > Hi Roger & everyone,
> > >
> > > Sorry for the delay in reply. After spending the last couple of weeks on
> > > ImageJ build issues related to native code components (specifically, the
> > > ImageJ launcher in C), I have some new perspective on the new code
> > > generation of the Bio-Formats build system.
> > >
> > > First of all, I want to say thanks to Roger for solving the build for
> > both
> > > Ant and Maven. I know maintaining the dual build systems can be
> > substantial
> > > extra work. But I think the Maven system has many advantages, so I am
> > happy
> > > it is being maintained.
> > >
> > > That said, I think Bio-Formats would greatly benefit from substantial
> > > modularization of components. We are realizing this with SCIFIO, and I
> > > think it applies to the OME-XML component as well.
> > >
> > > Below, I will lay out what I think is a better structure for the build
> > > system, which would result in more advantages and less pain than with the
> > > current structure.
> > >
> > > > One thing which might be an issue is that while xsd-fu generates the
> > > > ome-xml model code, which could potentially be downloaded, it also
> > > > generates all the MetadataStore, MetadateRetrieve and all the other
> > > > Metadata-related classes in scifio, including OMEXMLMetadataImpl.
> > > > Given that these are paired with the generated model code, generating
> > > > one and downloading the other may result in breakage on model changes,
> > > > or changes in xsd-fu or the templates which change the generated code.
> > >
> > > MetadataStore, MetadataRetrieve, etc., would move to the ome-xml
> > component,
> > > keeping all generated code together.
> > >
> > > One Git repository for each of:
> > >
> > > - SCIFIO (https://github.com/scifio/scifio)
> > > - OME-XML (https://github.com/openmicroscopy/ome-xml)
> > > - Bio-Formats (https://github.com/openmicroscopy/bioformats)
> > > - Fork: Apache POI (https://github.com/openmicroscopy/ome-poi)
> > >  -- change package prefix to avoid third party code collisions
> > > - Fork: MDB Tools Java (https://github.com/openmicroscopy/ome-mdb-tools)
> > >  -- change package prefix to avoid third party code collisions
> > > - Fork: JAI Image I/O (https://github.com/scifio/scifio-jai-image-io)
> > >  -- change package prefix to avoid third party code collisions
> > > - Stub: LWF (https://github.com/scifio/lwf-stubs)
> > >
> > > In other words, OME-XML gets its own Git repository, which includes all
> > the
> > > code generated code. Each fork and stub also has its own repository in
> > the
> > > relevant namespace.
> > >
> > > Dependencies between repositories would be done by release version
> > > coupling. For Maven projects (i.e., SCIFIO), simply making releases and
> > > using release dependencies would be sufficient to facilitate repeatable
> > > builds. For Ant-based projects (i.e., stuff in openmicroscopy namespace),
> > > release JARs would continue to be committed to the repository as they are
> > > now, or they could be resolved remotely via Ivy or similar.
> > >
> > > This would making building Bio-Formats much simpler and faster. As Roger
> > > pointed out, we do not really need to code generate the OME-XML stuff on
> > > every build, but rather only when the schema changes. Of course, the
> > > OME-XML component contains other code which would be subject to change
> > > between schema releases, but that's fine.
> > >
> > > This more modular structure would also facilitate these components being
> > > developed on separate release cycles. The forks and stubs rarely change
> > and
> > > do not need to be released with every OME release. And the OME-XML
> > project
> > > could be released along side schema changes (i.e., twice a year) rather
> > > than with every OME release.
> > >
> > > Comments welcome.
> > >
> > > Regards,
> > > Curtis
> > >
> > >
> > > On Thu, May 2, 2013 at 12:25 PM, Roger Leigh <r.leigh at dundee.ac.uk>
> > wrote:
> > >
> > > > On 02/05/2013 16:52, Curtis Rueden wrote:
> > > >
> > > >   > If so, the build is completely identical--the sources which get
> > > >>  > generated on the fly from the templates by xsd-fu are identical
> > bar a
> > > >>  > few lines comments in the top  boilerplate.
> > > >>
> > > >> OK, good to know.
> > > >>
> > > >> One more question/concern: presumably, the Bio-Formats build no longer
> > > >> functions on Windows, due to the Python + Genshi dependency. With the
> > > >> Ant build, this might be non-trivial to solve. But solving the issue
> > > >> with Maven is very straightforward: include the "ome-xml" module in
> > the
> > > >> reactor only within a profile. Then, when that profile is not enabled,
> > > >> Maven will resolve the ome-xml dependency from the remote repository
> > > >> rather than regenerating and rebuilding the code. This would eliminate
> > > >> the need to install Genshi, and make it easier to build on Windows
> > > >> again. What do you think?
> > > >>
> > > >
> > > > I'm afraid I'm no authority on Maven, so I'm not sure.  Maybe Melissa
> > or
> > > > Josh have a better take on this than me.  I assume that this will work
> > > > correctly on Windows if python is installed?
> > > >
> > > > One thing which might be an issue is that while xsd-fu generates the
> > > > ome-xml model code, which could potentially be downloaded, it also
> > > > generates all the MetadataStore, MetadateRetrieve and all the other
> > > > Metadata-related classes in scifio, including OMEXMLMetadataImpl.
> >  Given
> > > > that these are paired with the generated model code, generating one and
> > > > downloading the other may result in breakage on model changes,
> > > > or changes in xsd-fu or the templates which change the generated code.
> > > >
> > > > While it's not all enabled yet, I'd like to have the model selectable
> > as
> > > > an ant properly (it's xsdfu.schemaver), so that it's possible to change
> > > > to a different model when building.  There's currently some hardcoded
> > > > "2012-06" versions which need to be switched to change to use the
> > > > property value.
> > > >
> > > >
> > > > Regards,
> > > > Roger
> > > >
> > > > --
> > > > Dr Roger Leigh -- Open Microscopy Environment
> > > > Wellcome Trust Centre for Gene Regulation and Expression,
> > > > College of Life Sciences, University of Dundee, Dow Street,
> > > > Dundee DD1 5EH Scotland UK   Tel: (01382) 386364
> > > >
> > > >
> > > > The University of Dundee is a registered Scottish Charity, No: SC015096
> > > >
> > > > ______________________________**_________________
> > > > ome-devel mailing list
> > > > ome-devel at lists.**openmicroscopy.org.uk<
> > ome-devel at lists.openmicroscopy.org.uk>
> > > > http://lists.openmicroscopy.**org.uk/mailman/listinfo/ome-**devel<
> > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel>
> > > >
> >
> > > _______________________________________________
> > > ome-devel mailing list
> > > ome-devel at lists.openmicroscopy.org.uk
> > > http://lists.openmicroscopy.org.uk/mailman/listinfo/ome-devel
> >
> >