Friday 4 September 2009

Software Development with Certainty

My most recent project was to turn an impressive prototype developed by one person into a library, and applications based upon it, that can be simultaneously developed and used by a team.

The development process gains a new dimension once you have published your first version or have your first user. The task is made more tricky due to the absence of the original author.
In other words: situation normal, don't start from here.

The Erewhon applications, Gaboto library and its main dependencies ng4j and Jena are all changing rapidly. New functionality is being added and code and dependencies are being refactored and changed. The challenge is to enable this change without breaking installed systems or at least not breaking them unknowingly. This is ensured by establishing a contract between the code and the design by the use of tests. The tests guarantee that the system actually does do what it claims. Or, more properly, the tests are exactly what the system claims to do.

The minimum requirement to ensure that a project remains alive is that it keeps up with the current versions of all the libraries it depends upon. There are two approaches to the management of dependencies: Saltation and Continuous Integration.
Saltation is when you leave updating the dependencies until a blitz, you update all of them and re-test. The problem with Saltation is that it appears to the outside world that nothing is going on, the project is approaching stagnation, and when you do address the issue there is a lot of work to do, there is no obvious connection to the change that caused the problem and the developers of the library who have caused the problem have moved on and will not immediately know what they did to break your build.

By contrast using a Continuous Integration methodology one can expect to identify the particular commit that broke the build!

Continuous Integration relies upon repeating a repeatable build process. For this we turned to Maven as it addresses the other problem which we and our dependencies have: Dependency Management.

Dependency Management is a much re-invented wheel to address a problem which has re-occurred time and time again. In the world of Windows programming the problem is known as DLL Hell. In the Linux world there are two predominant dependency management systems: Deb from Debian and RPM from RedHat to manage package dependencies. Quite astonishingly, in the java world the problem was re-invented, as jar hell, by the practice of not versioning jar files. This anti-pattern was thought to be extinct by about 2001, however it has clung on in the eco-system surrounding Jena.

In addition to a repeatable build process with dependency management Maven offers code quality tools such as static analysers and style checkers and runs tests so enabling code coverage metrics.

Unlike Ant, which is a Turing complete scripting language with XML syntax, Maven is a project build system guided by an explicit definition of best practice and a widely used set of conventions which all Java programmers can now be expected to know.

Maven project definitions do not quite take us to Continuous Integration nirvana, we have a repeatable build but now we need to repeat it and to define when it should be repeated.

To repeat the build we use Hudson which can schedule, monitor and record builds defined as Ant scripts, bash scripts and commands as well a Maven builds. Hudson is currently the best of breed of the java CI servers, based upon experience with Continuum and Cruise Control: it just works.

The build should be repeated whenever any code within the project is changed or whenever any code in any dependency is changed. This is achieved by publishing SNAPSHOT builds after every commit.

A Maven project defines a single artefact, usually a jar but possibly a war, ear or website.
The artefacts are published to a repository which has the same structure as the main Maven repository.
Projects are versioned in the usual way, however between explicit versions a SNAPSHOT version is published, usually nightly.

  1. 0.1
  2. 0.2-SNAPSHOT
  3. 0.2
  4. 0.3-SNAPSHOT
  5. 0.3
  6. 0.4-SNAPSHOT


A non-SNAPSHOT build should not depend upon any SNAPSHOT artefacts.

The process of publishing a new version is now straight forward and should only involve incrementing the version, removing the SNAPSHOT classifier from all dependencies, tagging the source control system and deploying the artefact. Immediately after publication the SNAPSHOT classifier is added back to this project and all its dependencies.

In this state of bliss the creation of a new version is very little work and any breach of the contract between tests and code is caught as it is committed.

No comments:

Post a Comment