Bigger Wrench: February 2009

Friday, February 27, 2009

The build lifecycle

One of the things that is critical to any build system ( and this is something that the Maven guys nailed) is that it have a clearly defined lifecycle.

For myself over the last 20 years I have developed a build lifecycle that appears to answer my needs for every build.

Clean - Remove all build artifacts from the file system.
Init - Initialize the build system. All properties should be set here.
Prep - Prepare the file system for the build. Create folders as necessary.
GetDependencies - Resolve all dependencies. Locate and pull down the necessary dependencies and make them available to the build
Gen - Generate or Preprocess source code and resources of any type.
Build - Build anything that can be compiled or assembled.
UnitTest - Execute any tests that can be performed without deployment
Pkg - Package anything that can be packaged
Verify - Establish the internal integrity of the package
Deploy - Deploy any packages that need to be deployed.
SmokeTest - Execute any smoke tests against the deployed application(s)
Stage - Publish the build artifacts for local (i.e. machine local) usage.
Share - Publish the build artifacts for team wide usage.
Release - Publish the build artifacts for general (i.e. network) usage.
IntegrationTest - Execute any smoke tests

Most of that is not new. You have probably seen build systems that use subsets or supersets of these. The key is in making it very easy to plug in to a lifecycle event so that you can perform additional actions as necessary. Doing that in Ant takes some very careful thought but produces some very clean results.

It does require some conventions on the project and source code structure side of things. Another thing that the Maven guys nailed. :-)

The directory structure conventions I am using with my build system are the following:

I have a top level [project] Folder with several folders underneath.

[project]/root - This is the top level location from which you do builds that execute against everything. Commands like "CleanAll", "BuildAll", etc...
[project]/common-build - This contains the common build system I use across all projects. It is a separate project in my revision control system and is typically pulled down into a location under the project using Subversion's externals command or using Git's submodules.
[project]/components - This folder contains all of the sub projects that make up the components of this project.
[project]/apps - This folder contains all of the subprojects that make up the applications of this project.
[project]/installers - This folder contains all of the subprojects used to generate the installers of this project.

Under each components or apps or installers folder is a named subproject with a standard structure.

For example: The project "calendar-server" with the "calendar-utils" component would look like this:

calendar-server/components/calendar-utils

The component has a series of files. The build.xml file is of course the ant file for the component. The ivy.xml is a very simple file that describes what artifacts the component is dependent on. And the .project and .classpath files are those files needed by Eclipse.

calendar-utils/build.xml
calendar-utils/ivy.xml
calendar-utils/.project
calendar-utils/.classpath

And the calendar-utils component would have a checked-in folder structure that looks like this:

calendar-utils/src/main/java
calendar-utils/src/main/conf
calendar-utils/src/test/java
calendar-utils/src/test/conf

The source trees are separated for several reasons. Often the Java test code is compiled , managed, preprocessed, executed or referenced separately from the main code. The conf folders are also separate for some of the same reasons as well as allowing separate configurations to be set up for test and actual deployment.

There are also a series of common folders that are typically generated during the build. the first set of those is for generated code and those follow the same conventions as the checked source for much the same reasons.

calendar-utils/gen-src/main/java
calendar-utils/gen-src/main/conf
calendar-utils/gen-src/test/java
calendar-utils/gen-src/test/conf

The other set of folders is for the targets or results of the build. They all lie under the target folder. The classes and test classes are kept separate for the same reasons the sources are kept separate. The distrib folder contains all of the final output artifacts of the build ( jars, documents, zip files and executables etc.). The reports folder contains reports on the tests, profiling, etc.

calendar-utils/target/classes
calendar-utils/target/test-classes
calendar-utils/target/distrib
calendar-utils/target/reports

So to clean up this directory structure for a new pristine build only requires us to delete the gen-src and target folders.

In the next post I will talk about the design of the Ant build files found in the components and the common-build folder.

Build Systems : Ant versus Maven

Ever since I discovered Make (25+ years ago) I have been searching for a good build system. I have used everything from Configure and Make (talk about icing on a mud pie) to JAM and now Ant and Maven.

I keep going back and trying Maven again when they do a new release. It is such a good idea that I keep going back in the hopes that the implementation and documentation will finally live up to that promise. And I think a good number of people stay with Maven because it is such a good idea that they persevere and endure the slings and arrows of outrageous documentation and implementation. Alas, each time I come away frustrated.

The Maven repository concept is pure genius. And in fact, the implementation works well enough that I use it in connection with Apache's Ivyto do dependency management.

What is Ivy? A set of dependency management tasks used by Ant to pull down and access the appropriate jar or other dependencies needed by your project. I won't go into a tutorial about Ivy since there are more than a few out there. But the project itself is available at http://ant.apache.org/ivy/. it does suffer from some of the same documentation issues that Maven does but between the online forums and other peoples blog posts you can usually figure something out pretty quickly.

In the following few posts I will be discussing how I use Ivy in connection with Ant to produce a fairly clean build system with minimal bootstrap requirements.

Thursday, February 26, 2009

Revision Control: GIT

I love Subversion for revision control. It has a lot of power. And every once in awhile I need more power.

For those moments, there is GIT. It is blazing fast and industrial-strength. Unfortunately, reading the user documentation has my head ready to explode. There is this vague haunting of concepts just out of reach with explanations that almost, but don't quite give you the clue I need.

For those people who want a clear explanation of what it does, how it does it, and the concepts behind it so that you can manipulate it well; here is the book you need: "Pragmatic Version Control using Git" by Travis Swicegood.

Go to the Pragmatic Programming website and download the PDF for $22. It is clear, it is straightforward, it tells you what's going on in the background as well as having simple straightforward examples.

Discovering unused code in Java

When I set out to track down unused code in Java I came across a large number of static analysis tools that all seemed to do the job fairly well.

I used several of them including a fairly good eclipse plug-in called UCDetector. It was not fast, but it was very thorough.

By using those tools we were able to remove the obviously unused code. That resulted in a nontrivial shrinkage of about 30%. Unfortunately, due to the fact that much of the code gets called via Java's reflection API, There is a large amount of code that is not so obviously unused.

Since we have a UI test suite I thought that we would run the test suite against the front end and then log or track the methods that are actually called in the back end. Than we could eliminate the methods we found that were unused.

I first tried using the JDI interface of the JVM and simply remotely log each entrance into a method ( I didn't care about the exit). Unfortunately that slowed the Backend server system to a crawl. It would have taken weeks to get the data we needed.

I've tried both AspectJ and JBOSS AOP to produce a logging overlay and ran into significant problems deploying those in the older JBOSS 4.0.5GA environment. This was not significantly improved by the relatively nonstandard nature of our deployables.

Finally, we struck a gold mine. By using the YourKit profiler, which had minimal performance overhead, we were able to get the list of method calls that had been made. What made it especially easy was the profiler had a feature that would allow me to generate a dump of the call tree once an hour.

Here is the address of the profiler people: http://www.yourkit.com/

I just want to note that we were able to do all this using the evaluation version and that it easily passed a five minute test (. In other words, we were able to get it up and doing real work in five minutes). We have already ordered a copy.

Of course, taking 48 hours of those dumps and manually exporting them to CSV was a royal pain. To the YourKit guys: that is a hint.

What kind of jungle am I in?

Currently I am working for a company that is doing a rapid re-factoring of an existing Java code base. The application uses JBOSS for a J2EEE EJB back end serviced by a Tomcat web application as the front end. The wonderful thing is that the code does work.It is, unfortunately, an incredible bear to maintain.

This is, of course, no different than the life of many other developers. But since I am not interested in either myself or my fellow developers suffering, we are slowly but surely re-factoring the code base so that it is easier to maintain and we are faster at turning around new features.

One of the key elements is that the original 3 developers appear to have been stuck in a room and let loose for nine months to a year. As a result, we have a code base that has a phenomenal amount of unused code. One engineer seem to write a lot of code based on the idea that "it would be neat if the code did...". Another engineer wrote a lot of overly clever code rather than trying to brute force method to see if it would suffice. And the last engineer seemed to suffer from NIH: "Not Invented the Here" syndrome and rewrote portions of the Java standard library as well as the Hibernate toolkit.

So my task has been to track down unused code.

My next post will talk about the tools we used to do that.

Bigger Wrench