Relentless Build Automation
How far is going to far when you are doing build automation? My answer to that question, is that it’s basically not possible to go to far. Any task that you need to do more than three times deserves automation. Three times is my heuristic. The longer and more complex the task, the smaller the number you should allow yourself to do it by hand. Compiling for example is a pain if you don’t automate it, so you automate it the first time. Computers are good at one thing, running algorithms. If you are doing those algorithms by hand, you’re wasting your computer’s potential.
Why Automate?
There are two major reasons to automate:
- Reduce errors
- Being lazy
Reducing errors is easy. We want to have a repeatable, dependable process that anyone can use to achieve the same results. If your deploy master has to know 15 steps and the ins and outs of some complex process, what happens when she goes on vacation? You don’t do deployment. By automating you allow other people to perform the same task and you keep your computer working for you.
But being lazy? Yes! A truly great computer programmer is lazy and does everything in their power to avoid having to work that they’ve already done before. Likewise, if you’ve done it 10 times, doesn’t it get boring to do the same thing over and over again? Wouldn’t you rather be getting coffee or reading blogs while your computer works at what it does best? In addition to general laziness, automation will make your life easier when you want to do other things that you know you should be doing like unit testing, integration testing and functional testing.
What to Automate
Going back to my introduction, I say automate everything that you or someone new coming onto a project will need to do more than a few times. A little bit of effort up front can save every member of a team a lot of effort later on. So, as a punch list to get you started:
- Compiling code
- Packaging code into JARs, WARs, etc
- Running unit tests
Most people doing Java development will automate code compilation and packaging all of the time. Some of the times, they’ll automate unit tests, sometimes deployment, and rarely a few other things. We usually use some excellent script-based tools for that kind of thing like Ant or Maven. Even if you do all of these things, is that going far enough? I think that for most projects the answer is no.
If you are slightly more ambitious you will also automate:
- Integration testing
- Functional testing
- Deployment
There, now you’ve got all of your day-to-day activities automated. So you must be done right?
Generating Databases
Databases are the bane of automation. People don’t take the time to script their database creation or to script their default or testing data. When you don’t script your database, you make integration and functional testing very difficult. It’s really easy to write functional tests and integration tests if they rely on the data being in a known state. If they do not, then running them is very difficult because the data underneath is always changing. That usually means that integration and automated functional testing are deemed to hard and are not done. But don’t blame the tests. You’re avoiding a little bit of work, that if you did it, would allow you to avoid A LOT of work. (I think little jobs that prevent you from having to do big jobs are a good tradeoff.)
Generating your database will make new team members lives a breeze. They should be able to check out a project from source control and with one simple command, generate a database, compile the code, run the tests and know with confidence that everything is working for them. When they make any changes to the project they can repeat this process to know with confidence that they haven’t broken anything. Likewise in day-to-day development, being able to get back to a known state is going to give you the ability to develop with more confidence and test with ease.
Ant has a core task for running SQL queries, so you don’t even need any other tools to accomplish this. One problem that you might run into is that running DDL commands in JDBC can cause some problems. I found a great article on executing PL/SQL with Ant that you can probably apply to other databases as well. It allows you to run more database specific queries to build your schema from a blank database. This helped me overcome the last hurdle in a recent project to get it so I had the entire database scripted from scratch. So check it out.
I find that having a few standard Ant tasks makes this a breeze:
- A task to create tablespaces (if you have separate tablespaces for each database)
- A task to create a user
- A task to generate the full schema
- A task to delete the user and drop the schema
Together these tasks can be woven together to fully to create the full database. To support development and testing, it is also handy to have loadable datasets around. Either maintain a list of SQL INSERT statements or use a tool like DBUnit to load the data into your database. Often it can be nice to have a few sets of base data:
- Unit testing data
- Functional testing data
- Demo data for showing of to clients and manager
The only thing lacking is an easy way to dump a schema from a working database so that you can use nice GUI tools to modify the database and then dump it out using an easy Ant target. Even with this, you’ll likely need to maintain change scripts so that you can apply them to working production systems without wiping out all of the data. I guess since I’m complaining, that will have to be my next side project. Leave a comment if you know how to do this.
Conclusion
With a little bit of effort in going all the way with automation, you can make it easy for new people to come onto a project, you can make it easier to create and run all kinds of tests, and you can make the entire development process easier and more confident. The tools exist to help make automation easy, so why not?
What things do you like to automate that you see people skipping on projects?
Resources
Java Unit Testing: JUnit, TestNG
Web Functional Testing: Selenium
Java Build Automation: Ant
Update
Looks like someone is a couple of steps ahead of me. While not a final release, the Apache DdlUtils contains a set of Ant tasks that give you the ability to read and write database schemas as well as loading data sets defined in XML documents. Something to keep an eye on…
You could also automate error reporting, but at that point it is really just programming. It seems once you have gotten past automating building, testing and deploying you are in a different territory beyond simple scripting. Sure you could write a script (Ant) which aggregates your errors and produces a report, but at that point you are allowing your scripting to bleed into the domain of what the application would be better equipped to handle.
It is best to ensure that build scripts do not go too far. To script a database you will likely have PL/SQL (or T/SQL) procedures you can schedule or set up with a trigger to run on an as needed basis which works best in that domain. I know how trivial it is to create a DTS package in SQL Server and schedule it to run nightly. And by allowing that code to be written in that language native to that area you allow the experts in that area to maintain it because not everyone on a team will take the time to learn to write an Ant script.
I also consider that automation scripts should be a very small portion of a project which act as the glue to keep everything together. If they become a significant portion of your project you have to reconsider the role of that scripts. It reminds me of how you can put looping and conditional statments into XSLT and soon you end programming more in your templating solution than you do in your actual code. Just because you can do it, does not mean it is a good idea.
I want to talk about relentless build automation, once you get outside of the things that you need to get a system up and running from source, then you’re outside that realm. So, error reporting wouldn’t count in my book…
The beauty of a shared automation script is that not everyone on the team has to be an expert in that scripting technology. Everyone on the team can benefit from the automation expertise of only a few people. The same way that you can benefit from having a CSS expert or a SQL expert. Everyone will know the basics, but you’ll have a “goto guy” for more complex tasks.
I also don’t think that script size is relevant, they should be as big as they need to be and no bigger. Automation with something like Ant is programming in a Domain Specific Language. It is a DSL that is optimized for scripting repeatable developer tasks (just like XSLT is a DSL for transforming XML documents). DSLs can be much more powerful and expressive because they are working at a higher level of abstraction. Take advantage of that power when you can.
If you’re build scripts can’t create a database, if they can’t populate it with test data, if it’s not an automated, repeatable process, then you can’t get a new developer up and running without intervention, you can’t deploy to a new server without manual steps, you can’t trash your development environment and start from a new hard drive without rebuilding a whole bunch of stuff by hand. Think of all the time you’re wasting.
I also have added source code reports like Emma (Unit test Coverage), PMD/Checkstyle, JavaNCSS, etc. To provide feedback and quality metrics to the build scripts.
These reports can then be used to improve code quality as part of the build process.
Al,
Good point. By running code metrics (code coverage, complexity analysis, etc) as a regular part of your build process, you can track changes over time to make sure that you are not sacrificing on quality. Most often I would recommend that this is done using a Continuous Integration server as it’s mostly a tracking, early warning system. Do you think it’s the kind of thing that belongs in a day-to-day development build script?
I run the build reports on every successful build as a regular part of the continuous integration cycle. I know several other buildmasters that do the same thing.
My reporting scripts are in a separate build file though… Developers don’t normally run them as part of thier development routine.