Java 7 Code Coverage with Gradle and Jacoco

Thanks

Steven Dicks’ post on Jacoco and Gradle is a great start to integrating Jacoco and Gradle, this is a small iteration on top of that work.

Java 7 Code Coverage

The state of Code Coverage took a serious turn for the worst when Java 7 came out. The byte-code changes in Java 7 effectively made Emma and Cobertura defunct. They will not work with Java 7 constructs. Fortunately there is a new player in town called JaCoCo (for Java Code Coverage). JaCoCo is the successor to Emma which is being built on the knowledge gained over the years by the Eclipse and Emma teams on how best to do code coverage. And works with Java 7 out-of-the-box.

The advantage of using established tools is that they generally are well supported across your toolchain. JaCoCo is fairly new and so support in Gradle isn’t so smooth. Fortunately Steven’s post got me started down the right path. The one thing that I wanted to improve right away was to use transitive dependency declarations as opposed to having local jar files in my source repository. JaCoCo is now available in the Maven repos so we can do that. One thing to note is that the default files build in the Maven repo are Eclipse plugins, so we need to reference the “runtime” classifier in our dependency

The Gradle Script

configurations {
    codeCoverage
    codeCoverageAnt
}
dependencies {
    codeCoverage 'org.jacoco:org.jacoco.agent:0.5.10.201208310627:runtime@jar'
    codeCoverageAnt 'org.jacoco:org.jacoco.ant:0.5.10.201208310627'
}
test {
    systemProperties = System.properties
    jvmArgs "-javaagent:${configurations.codeCoverage.asPath}=destfile=${project.buildDir.path}/coverage-results/jacoco.exec,sessionid=HSServ,append=false",
            '-Djacoco=true',
            '-Xms128m',
            '-Xmx512m',
            '-XX:MaxPermSize=128m'
}
task generateCoverageReport << {
    ant {
        taskdef(name:'jacocoreport', classname: 'org.jacoco.ant.ReportTask', classpath: configurations.codeCoverageAnt.asPath)
 
        mkdir dir: "build/reports/coverage"
 
        jacocoreport {
            executiondata {
                fileset(dir: "build/coverage-results") {
                    include name: 'jacoco.exec'
                }
            }
            structure(name: project.name) {
                classfiles {
                    fileset(dir: "build/classes/main") {
                        exclude name: 'org/ifxforum/**/*'
                        exclude name: 'org/gca/euronet/generated**/*'
                    }
                }
                sourcefiles(encoding: 'CP1252') {
                    fileset dir: "src/main/java"
                }
            }
 
            xml  destfile: "build/reports/coverage/jacoco.xml"
            html destdir:  "build/reports/coverage"
        }
    }
}

A Few Details

The magic is in the jvmArgs of the test block. JaCoCo is run as a Java Agent which uses the runtime instrumentation feature added in Java 6 to be able to inspect the running code. Extra arguments can be added to JaCoCo there including things like excludes to exclude specific classes from coverage. The available parameters are the same as the maven JaCoCo parameters.

The generateCoverageReport task converts the jacoco.exec binary into html files for human consumption. If you’re just integrating with a CI tool, like Jenkins, then you probably don’t need this, but it’s handy for local use and to dig into the details of what’s covered.

Loose Ends

One problem that I ran into was referencing project paths like the project.buildDir from within an Ant task. Hopefully someone will come along and let me know how that’s done.

Java Return from Finally

try…catch…finally is the common idiom in Java for exception handling and cleanup. The thing that people may not know is that returning from within a finally block has the unintended consequence of stoping an exception from propagating up the call stack. It “overrides” the throwing of an exception so that the caller will never get to handle it.

public class Main {
    public static void main(String[] args) throws Throwable {
        System.out.println("Starting");
        method();
        System.out.println("No way to know that an exception was thrown");
    }
 
    public static void method() throws Throwable {
        try {
            System.out.println("In method about to throw an exception.");
            throw new RuntimeException();
        } catch (Throwable ex) {
            System.out.println("Caught exception, maybe log it, and then rethrow it.");
            throw ex;
        } finally {
            System.out.println("return in finally prevents an exception from being passed up the call stack.");
            return; // remove the return to see the real behavior
        }
    }
}

I recently came across code like this. This is Real Bad. Returning from within finally prevents the propagation of regular exceptions, which is bad enough, but worse, prevents the propagation of runtime exceptions which are generally programmer errors. This one small mistake can hide programmer errors so that you’ll never see them and never know why things aren’t working as expected. One of the interesting things is that the Java compiler understands this as well. If you return from within a finally block where an exception would otherwise be thrown, the compiler does not force you to declare that exception in the method’s throws declaration.

Long story short. Don’t return from with finally. Ever.

MongoDB and Java: Find an item by Id

MongoDB is one of a number of new databases that have cropped up lately eschewing SQL. These NoSQL databases provide non-relational models that are suitable for solving different kinds of problems. This camp includes document oriented, tabular and key/value oriented models among others. These non-relational databases are supposed to excel at scalability through parallelization and replication but sometimes (although not always) at the expense of some of the transactional guarantees of SQL databases.

Why would you care about any of this? Document oriented databases allow for each document to store arbitrary pieces of data. This could allow for much easier customization of data storage such as when you want to store custom fields. Many of these databases also make horizontal scaling quite simple as well as providing high performance for write heavy applications.

With this in mind I figured I should look and see what’s there. So I started looking at MongoDB.

Start by creating an object to add to the database

With MongoDB, a collection is conceptually similar to a table in a SQL database. It holds a collection of related documents. A DBObject represents a document that you want to add to a collection. MongoDB automatically creates an id for each document that you add. That id is set in the DBObject after you pass it to the save method of the collection. In a real world application you might need that id to later access the document.

DBObject obj = new BasicDBObject();
obj.put("title", getTitle());
obj.put("body", getBody());
 
DBCollection coll = db.getCollection("note"));
coll.save(obj);
 
String idString = obj.get("_id").toString();

Retrieve an object previously added to a collection

To get a document from MongoDB you again use a DBObject. It does double duty in this case acting as a the parameters you want to use to identify a matching document. (There are ways you can do comparisons other than equality, of course, but I’ll leave that for a later post.) Using this as a “query by example” model we can set the _id property that we previously retrieved. The one catch is that the id is not just a string, it’s actually an instance of an ObjectId. Fortunately when we know that it’s quite easy to construct an instance with the string value.

String idString = "a456fd23ac56d";
DBCollection coll = db.getCollection(getCollectionName());
DBObject searchById = new BasicDBObject("_id", new ObjectId(idString));
DBObject found = coll.findOne(searchById);

A couple of easy examples, but it wasn’t obvious to me when I started how to get the id of a document that I just added to the database. More to come in the future.

For more on MongoDB check out these books:

Struts2 Map Form to Collection of Objects

The Struts2 documentation contains examples that are often basic at best which can make it challenging to figure out how to do things sometimes. I was working on creating a form that would allow me to select values from a list to connect 2 objects in a One-to-Many relationship. This is a common relationship for many things. In this example, I’ll use a User and Role class to demonstrate the concept.

For background, here’s a JPA mapped User and Role class.

import java.util.List;
import javax.persistence.*;
 
@Entity
public class User {
 
    private Long id;
    // ... other member variables
    private List<Role> roles;
 
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    public Long getId() {
        return id;
    }
 
    public void setId(Long id) {
        this.id = id;
    }
 
    @OneToMany
    @JoinTable(name = "UserRoles",
            joinColumns = @JoinColumn(name = "user_Id"),
            inverseJoinColumns = @JoinColumn(name = "role_Id"),
            uniqueConstraints = @UniqueConstraint(columnNames = {"user_Id", "role_Id"})
    )
    public List<Role> getRoles() {
        return roles;
    }
 
    public void setRoles(List<Role> roles) {
        this.roles = roles;
    }
 
    // ... other properties
}
 
@Entity
public class Role {
 
    private Long id;
 
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    public Long getId() {
        return id;
    }
 
    public void setId(Long id) {
        this.id = id;
    }
 
    // ... other properties
}

A list of Roles exists in the database. When a User is created, they are assigned one or more Roles. The User and Roles are connected in the Database through a join table as defined in the mapping. At this point I created some DAO Repository classes to manage the persistence and an Action to handle the form values. (The details of JPA, setting up Persistence and Actions are beyond the scope of this post.)

The part that caused me the most grief ended up being the form. I wanted to add a checkbox list that contained all of the Roles. The example on the Struts site for checkboxlist, a control with 43 documented properties is:

<s:checkboxlist name="foo" list="bar"/>

Needless to say, there was some ‘figuring out’ to be done.

The form itself is pretty vanilla for the most part. The checkboxlist is the interesting part because it’s what allows us to map the User to the Roles. I knew that I was looking for something to put into the value property of the control that would tell it to pre-select the Role values that were already associated with the User.

I started out with something like:

<s:checkboxlist name="user.roles.id"
                    list="roles"
                    listKey="id"
                    listValue="name"
                    label="%{getText('label.roles')}"
                    value="user.roles"/>

That didn’t work. When you think about it, that makes sense because the keys in the list are ids and the values supplied are Role objects. So I needed to figure out how to get the Ids from the Roles. I could have done that in the Action class, but it seemed like there should be a better way. A way that would allow me to continue in more of a Domain fashion.

Doing some research into OGNL, I came upon the list projections section which was the key…

The OGNL projection syntax gives us user.roles.{id}. Basically that is a list comprehension that takes of list of Role objects and turns it into a list of Role Ids. That list of Ids becomes the list of values that will be preselected.

Knowing that I can now create a select box that will include the pre-selected values on an edit form:

<s:checkboxlist name="user.roles.id"
                    list="roles"
                    listKey="id"
                    listValue="name"
                    label="%{getText('label.roles')}"
                    value="user.roles.{id}"/>

Enjoy.

Groovy Programming Language

Welcome to the disco era…wait, wrong Groovy. Groovy the programming language is dynamic programming language that runs on the Java Virtual Machine. At a first glance it looks a lot like Ruby, but it’s built from the ground up to leverage the JVM. This offers a lot of power as a transitional language. It allows you to leverage an existing investment in Java code while transitioning to a powerful, dynamic language.

Java and Groovy – Happy Together

One of the most interesting things about Groovy is that almost all standard Java code is valid Groovy code.

The quintessential example:

public class Hello
{
public static void main(String[] args)
{
System.out.println("Hello");
}
}

Is that Java code or Groovy code? In fact it’s both. So what’s the big deal?

Well, in Groovy you can write the same thing like:

println("Hello")

To steal Venkat Subramaniam’s joke: If you get paid by lines of code, this is a terrible thing.

Easier Construction

One of the other cool things with Groovy is that you can set named properties in the constructor. Conceptually setting a bunch of properties after you construct a class is often the same thing as setting those properties in a constructor. This makes code quite a bit cleaner looking. It also means the class writer does not need to predict all the combinations of the properties you might want use to initialize an object.


car = new Car(make: "VW", model: "GTI", year: 2001)

Easier Collections

Of course they also add some interesting things to handle collections as well.

lst = [1, 2, 4, 6, 8, 12] // create java.util.ArrayList
println lst[-1] // 12
println lst[1..3] // [2, 4, 6]
println list[-1..-2] // [12, 8]

Easier Code Reuse

One of the biggest things that Groovy offers, which is the baby of the new scripting languages (and the baby of a bunch of old languages), is closures. Closures are blocks of code that can be passed around and executed sharing the context of where they are called from and where they are executed.

This code separates the concept of iterating from what you want to do with the elements that are iterated over. This makes for great reusability.

public iterate(count, closure)
{
for (i in 0..count)
closure(i)
}

iterate(5) { println it }
val = 0
iterate(5) { val += it }

Of course this kind of thing is built into the existing collections as well.


myNums = [1, 3, 5, 7, 9, 13]
myNums.each { curNum -> println curNum }

Dynamic Messaging

Unlike languages like Java or C#, Groovy gives you the ability to respond to method calls in a dynamic way. You do not have to have all of the methods defined up front, but can rather respond to them at runtime.


baseball = ['Milwaukee' : 'Brewers', 'Chicago' : 'White Sox']
println baseball.Milwaukee // Brewers
println baseball.'Chicago' // White Sox

In the case of the HashMap it interprets the new “method call” as your way of asking for the value from the Hash. Of course there are many other ways that you can use this as well. You can dynamically respond to methods on any class.


class Thing
{
def invokeMethod(String name, args)
{
println "I'm #{name}ing!"
}
}

Of course this is a silly example, but it’s utilized to great effect in the XML Builder.

baseball = ['Milwaukee' : 'Brewers', 'Chicago' : 'White Sox']
bldr = new groovy.xml.MarkupBuilder()
xml = bldr.teams {
baseball.each { key, value ->
team(city : key) {
name (value)
}
}
}

Gives you:



Brewers


White Sox


Isn’t that a nice way to write XML?

Conclusion

I hope that gives you a taste of Groovy. It could be a very interesting language for some problems especially if you are already a Java shop looking for some more dynamic language features. That way you don’t have to abandon everything you’ve done in the past. Instead you can start a slow migration or a bit of exploration utilizing the older code.

Welcome Java 6

With great fanfare, Java 6 was released to the world. Ok, that was a joke. Sun has the ability to make a major release without any fanfare or hullabaloo. Even Spring had a countdown to their 2.0 release. Oh well.

Java 5 was the really big release and added a lot of new functionality to the core language. Back in October of 2005 I wrote about the new features in Java 5 and did a little bit of prognostication about what might be next for Java. Well, if you look at it, you will soon realize that I was totally wrong. It turned into more of a “what I like in scripting languages that I wish I had in Java”. Oh well, I won’t quit my day job to try and predict the future. Java 5 added number of new language constructs like Generics, foreach loops, and enums. Java 6 on the other hand doesn’t seem to add any new language constructs, but is all about extending the APIs. Where Java 5 almost had a theme of catching up to .NET 2.0, Java 6 seems like a relatively random collection of new things.

Scripting

The Scripting API is one of the bigger and more talked about features. It’s definitely the thing that I’m most excited about. We’ve had Jython for years, Groovy for a while and more recently JRuby which allowed us to use more scripting languages on top of the JVM. Java 6 has really extended support for using these languages on the JVM. Java 6 comes with support for using JavaScript built in. This support is built on top of the Rhino engine originally from Mozilla.

The API seems to offer everything you could want. It allows you to access Java APIs from the scripting language (this is likely implementation dependent, but is done in the built in JavaScript implementation), it allows you to execute the scripting language methods in Java. It even allows you to mold dynamic languages to Java by forcing them into implementing specific interfaces.

As with most Java APIs, it has an extension mechanism to support various languages. There are already implementations of a lot of different languages available for download, so you can start trying your favorite scripting language.

Check out the Java 6 Scripting Guide for some great examples on various pieces of the API.

Database and JDBC 4.0

JDBC 4.0 offers a number of improvements to database access in Java. One of the bigger changes is that Java now comes with a 100% Java database implementation based on Apache Derby. That makes it really easy to get started with an application. You can graduate to a standalone database if you need to later. JDBC 4 makes some improvements to loading of database drivers. It can now use meta-data in a file under META-INF. This means that you don’t have to add the old Class.forName(“some.db.Driver”) in your code. It also means that changing the database is no longer a code-recompile, but just a simple file change (ok, so any serious application would have that string externalized already, but this is a nice thing to standardize).

Check out this article for a good overview of the changes in JDBC 4.0.

Desktop Integration

The Java 6 release has seen a number of improvements in terms of desktop integration so that Java Swing applications won’t feel like aliens. They’ve added support for splash screens so that users can get immediate feedback that your application is starting. They’ve also added APIs for dealing with Tray Icons and the System Tray which are really handy for a lot of applications.

Monitoring, Instrumentation and Tools Interface

I haven’t looked at a lot of the details of what’s been added, but it seems like they’ve put a lot of work into the monitoring and instrumentation side of a running VM. JConsole, a great tool for looking at the various memory areas in your JVM, is now officially supported.

One of the more interesting areas is the ability to transform already loaded Classes. The new classes in java.lang.instrumentation seem like they will be a great boon for a number of different uses. AOP and other Proxy implementations could make use of this. It also seems like this could be a very useful feature in conjunction with the new Scripting support. Languages like Ruby and JavaScript support dynamically adding methods and changing method definitions at runtime. Will this new code allow Script plugin developers to introduce that dynamic functionality into Java?

These are the things that seem most interesting to me. Check out all of the new features in Java 6 for yourself. What are you excited about?

Hibernate Query Translators

I’ve recently been doing some performance testing and tuning on an application. It makes use of Hibernate for the data access and ORM and Spring to configure and wire together everything. As I was looking at all of the configuration and came upon the fact that we were using the ClassicQueryTranslatorFactory. The job of the Query Translator is to turn HQL queries into SQL queries. The ClassicQueryTranslatorFactory is the version that was included in Hibernate 2. In Hibernate 3 they created a new Query Translator, the ASTQueryTranslatorFactory. This Query Translator makes use of Antlr which is a Java based parser generator in the vein of lex and yacc.

I switched out the the ClassicQueryTranslatorFactory and started to use the ASTQueryTranslatorFactory and saw an immediate boost in performance of about 15% for the application. I also noticed that fewer queries were being generated for the page loads for the application. Of course this application uses quite a bit of HQL, so if you do not make use of HQL extensively, then you might not see the same benefits.

I have yet to see any documentation or any other evidence to support the claim that the newer ASTQueryTranslatorFactory would offer better performance, but in my case it seems like it has. Has anyone else noticed this behavior?

jMock For Mocking Unit Tests

Ok, so I might not exactly be on the cutting edge with this post, but I just started playing with jMock, a framework for creating Mock objects.

Mock objects are used to isolate various layers of an application by providing a fake implementation that will mimic the behavior of the real implementation but offers a deterministic behavior. It’s very useful for isolating database layers as well because DB access can slow down unit tests dramatically. And if unit tests take to long then they won’t get run. Like Inversion of Control (IoC), Mock Objects, can be done by hand without the use of a framework, but frameworks can make the job a lot easier.

jMock alows you to setup the methods that you want to call and define the return values for a given set of parameters. You basically script the expected calls with the return values you want. This lets you isolate classes so that you can test just a single method or class at a time thus simplifying the tests.


public class MockTester extends MockObjectTestCase {
Mock mockOrderService;
OrderService orderService;

public void setUp() throws Exception {
mockOrderService = new Mock(OrderService.class);
orderService = (OrderService) mockOrderService.proxy();
}

public void testSomeServiceMethod() throws Exception {
String orderId = "orderId";

// The "doesOrderExist" method will be called once with the orderId parameter
// and will return a true boolean value
// If the method isn't called, then the mock will complain
mockOrderService.expects(once())
.method("doesOrderExist")
.with(eq(orderId))
.will(returnValue(true);

FullfillmentService fullfillment = new FullfillmentServiceImpl(orderService);
assertTrue(fullfillment.confirmOrder(orderId));
}
}

One thing to realize is that by default jMock will only create proxies for interfaces. If you want to mock concrete classes, you’ll need the jmock-cglib extension and the asm library so that it can create proxies of the concrete classes.

I find this sort of scripting of Mock objects very compelling. It allows you to focus on testing the behavior of a very isolated piece of code. It even allows you to test code without having written all of the dependent objects. I encourage you to check it out for yourself.

Hibernate HQL And Performance

The Hibernate ORM tool give you the ability to write SQL-esque queries using HQL to do custom joining, filtering, etc. to pull Objects from your database. The documentation gives you a lot of examples of the things you can do, but I haven’t seen any caveats or warnings.

Database Performance

As far as database performance goes there are two major things to start with when you want to understand your database performance:

  • How many queries are run?
  • How expensive are the individual queries?

Not too earth shattering is it? Basically if you run fewer queries of the same cost you’re better off. Likewise, if you make the queries themselves cost less (by optimizing the queries themselves, creating the proper indexes, etc) then they will run faster. So of course the best is to do both. Identify you to run fewer, faster queries. (Yes, I’m still waiting on my Nobel prize.)

I’ll talk more about fewer queries later…

To make queries faster, you mostly are working in the database. You depend on good tools and good statistics. If the size and kind of data changes, you might have to redo this stuff.

To Optimize your database queries:

  1. Run some queries examining their execution plans
  2. Find some possible columns to index
  3. Create an index
  4. Re-run the queries and examine the execution plans again
  5. Keep it if it’s faster, get rid of it if it’s not
  6. Goto 1

Hibernate and Caches

Hibernate does one thing: It maps Objects to a Relational database. Hibernate is really pretty good at that mapping and can support all kinds of schemas. So you should be able to (relatively) easily map your objects to your schema.

Hibernate also has two potential caching schemes. What it calls Level-1 and Level-2 caching. Level-1 caching is done through the Hibernate session. As long as the Hibernate session is open, any object that you have loaded will be pulled from the session if you query for it again.

The Level-2 cache is a longer-running, more advanced caching scheme. It allows you to store objects across Hibernate sessions. You’re often discouraged against using Level-2 caching, but it is very nice for read-only objects that you don’t expect to change in the database (think of pre-defined type information and the like). Again, if you query or one of these objects using Hibernate, then you’ll get an object from the Level-2 cache.

Notice how the Level-1 and Level-2 cache prevent Hibernate from having to re-query the database for a lot of objects. This of course can be a huge performance benefit. Likewise, Hibernate supports Lazy Loading of collections, so if your object is related to a collection of other objects, Hibernate will wait to load them until you need them. Once they’ve been loaded though, they are in the Object graph, so accessing them a second time does not require another round-trip to the database.

All of this lazy loading and caching is about reducing the number of queries you need to run against the database. You can also tweak your Hibernate mapping files to implement things like batching (loading children of multiple parents in one query) to greatly reduce the number of queries that need to be run. You can also specify to pre-load a related object using a left join if you will always need the object and want to get both in the same query. Most of the decisions are dependent on your application and what you are doing, but they are very easy to play with in your configuration and see if they improve your application performance.

Why the hard time for HQL?

All of the Caching and tweaking you can do in your Hibernate mappings (or using Annotations) is totally wasted if you using HQL queries to load your objects.

If you specify a fetch=”join” in your mapping to do a left join and load a dependent object, that doesn’t get used when you use HQL to load the object, so you will be doing more queries than you need.

If you have natural mappings of parent/child relationships then the following code will only generate a single query to load the Person and a single query to get the Addresses.

Person p = session.get(Person.class, 1);
List

address = p.getAddresses();
List
address2 = p.getAddresses();

This code still only generates two queries:

Person p = session.createQuery("from Person where id=:id")
.setParameter("id", 1).uniqueResult();
List

address = p.getAddresses();
List
address = p.getAddresses();

But the following code generates twice as many queries to load the addresses.

Person p = session.createQuery("from Person where id=:id")
.setParameter("id", 1).uniqueResult();
List

address = session
.createQuery("from Addresses where person_id=:id")
.setParameter("id", 1).list();
List
address2 = session
.createQuery("from Addresses where person_id=:id")
.setParameter("id", 1).list();

Of course this is a totally contrived example, but if you’ve built out a large system with a Service Facade and DAOs these kinds of things can easily be hidden deep in the application where it would be hard to know whether a call would trigger a database call or not. So be very conscious of using HQL queries and the consequences of using them.

Hibernate rewards you for using natural relationships in your Objects. It rewards you with performance for building a POJO based Object Oriented system.

Hibernate HQL Rules

Rule #1: Don’t use HQL.
Rule #2: If you really need to use HQL, see Rule #1.
Rule #3: If you really, really need HQL and you know what you’re doing, then carefully use HQL.

Ok, so if I’m right about this, why is this not at the top of the HQL documentation? Don’t you think they should talk about this as a method of last resort?

Time to start reading POJOs in Action again.

Relentless Build Automation

How far is going to far when you are doing build automation? My answer to that question, is that it’s basically not possible to go to far. Any task that you need to do more than three times deserves automation. Three times is my heuristic. The longer and more complex the task, the smaller the number you should allow yourself to do it by hand. Compiling for example is a pain if you don’t automate it, so you automate it the first time. Computers are good at one thing, running algorithms. If you are doing those algorithms by hand, you’re wasting your computer’s potential.

Why Automate?

There are two major reasons to automate:

  1. Reduce errors
  2. Being lazy

Reducing errors is easy. We want to have a repeatable, dependable process that anyone can use to achieve the same results. If your deploy master has to know 15 steps and the ins and outs of some complex process, what happens when she goes on vacation? You don’t do deployment. By automating you allow other people to perform the same task and you keep your computer working for you.

But being lazy? Yes! A truly great computer programmer is lazy and does everything in their power to avoid having to work that they’ve already done before. Likewise, if you’ve done it 10 times, doesn’t it get boring to do the same thing over and over again? Wouldn’t you rather be getting coffee or reading blogs while your computer works at what it does best? In addition to general laziness, automation will make your life easier when you want to do other things that you know you should be doing like unit testing, integration testing and functional testing.

What to Automate

Going back to my introduction, I say automate everything that you or someone new coming onto a project will need to do more than a few times. A little bit of effort up front can save every member of a team a lot of effort later on. So, as a punch list to get you started:

  • Compiling code
  • Packaging code into JARs, WARs, etc
  • Running unit tests

Most people doing Java development will automate code compilation and packaging all of the time. Some of the times, they’ll automate unit tests, sometimes deployment, and rarely a few other things. We usually use some excellent script-based tools for that kind of thing like Ant or Maven. Even if you do all of these things, is that going far enough? I think that for most projects the answer is no.

If you are slightly more ambitious you will also automate:

  • Integration testing
  • Functional testing
  • Deployment

There, now you’ve got all of your day-to-day activities automated. So you must be done right?

Generating Databases

Databases are the bane of automation. People don’t take the time to script their database creation or to script their default or testing data. When you don’t script your database, you make integration and functional testing very difficult. It’s really easy to write functional tests and integration tests if they rely on the data being in a known state. If they do not, then running them is very difficult because the data underneath is always changing. That usually means that integration and automated functional testing are deemed to hard and are not done. But don’t blame the tests. You’re avoiding a little bit of work, that if you did it, would allow you to avoid A LOT of work. (I think little jobs that prevent you from having to do big jobs are a good tradeoff.)

Generating your database will make new team members lives a breeze. They should be able to check out a project from source control and with one simple command, generate a database, compile the code, run the tests and know with confidence that everything is working for them. When they make any changes to the project they can repeat this process to know with confidence that they haven’t broken anything. Likewise in day-to-day development, being able to get back to a known state is going to give you the ability to develop with more confidence and test with ease.

Ant has a core task for running SQL queries, so you don’t even need any other tools to accomplish this. One problem that you might run into is that running DDL commands in JDBC can cause some problems. I found a great article on executing PL/SQL with Ant that you can probably apply to other databases as well. It allows you to run more database specific queries to build your schema from a blank database. This helped me overcome the last hurdle in a recent project to get it so I had the entire database scripted from scratch. So check it out.

I find that having a few standard Ant tasks makes this a breeze:

  • A task to create tablespaces (if you have separate tablespaces for each database)
  • A task to create a user
  • A task to generate the full schema
  • A task to delete the user and drop the schema

Together these tasks can be woven together to fully to create the full database. To support development and testing, it is also handy to have loadable datasets around. Either maintain a list of SQL INSERT statements or use a tool like DBUnit to load the data into your database. Often it can be nice to have a few sets of base data:

  • Unit testing data
  • Functional testing data
  • Demo data for showing of to clients and manager

The only thing lacking is an easy way to dump a schema from a working database so that you can use nice GUI tools to modify the database and then dump it out using an easy Ant target. Even with this, you’ll likely need to maintain change scripts so that you can apply them to working production systems without wiping out all of the data. I guess since I’m complaining, that will have to be my next side project. Leave a comment if you know how to do this.

Conclusion

With a little bit of effort in going all the way with automation, you can make it easy for new people to come onto a project, you can make it easier to create and run all kinds of tests, and you can make the entire development process easier and more confident. The tools exist to help make automation easy, so why not?

What things do you like to automate that you see people skipping on projects?

Resources

Java Unit Testing: JUnit, TestNG
Web Functional Testing: Selenium
Java Build Automation: Ant

Update

Looks like someone is a couple of steps ahead of me. While not a final release, the Apache DdlUtils contains a set of Ant tasks that give you the ability to read and write database schemas as well as loading data sets defined in XML documents. Something to keep an eye on…