High Performance and Parallelism With the Feel of a Scripting Language

Sometimes you need a quick-and-dirty tool to get something done. You’re not looking for a long-term solution, but rather just have a simple job to do. As a programmer when you have those thoughts you naturally migrate toward scripting languages. But sometimes that throw away tool also needs to do highly parallel operations with good performance characteristics. Until recently it seemed like you only got to choose one or the other though. And then came Go.

A Use Case

We’ve been working on an application that provides APIs for other apps. Those APIs are required to be fast and to scale up to many concurrent users. We needed a way to push a lot of traffic to this API while ensuring that the API would access a wide swath of the data in the database. We didn’t want to run into the case where the same request was being made over and over allowing the database to end up with an unrealistic scenario where it had all the data cached. There are a number of existing tools for this kind of performance testing, but seeing some of the tests run didn’t give us much confidence that they were really running these requests in parallel like we needed. We also wanted to be able to easily run these tests from many different clients computers at once so that we could ensure that the client computers and internet connections were not the bottleneck.

How Does Go Fit That Use Case?

Write-Once (Compile a Few Times) and Run Anywhere

One of the advantages of Go is that it is easy to cross-compile it to other architectures and operating systems. This property made it easy to write a little application that we could run at the same time on Mac OS and Linux. Just like a scripting language it was write-once and run anywhere. Of course we had to compile it for each of the different operating systems but that is incredibly easy with Go. Unlike most scripting languages, once a Go binary is compiled for an OS, nothing else needs to be installed to run it. There’s no management of different versions or libraries. A Go binary is entirely self-contained so no extra Go runtime is needed to be installed for the application to be run and all of the depencies are statically linked in. Simply copy the binary to the appropriate machine and execute it. You can’t get much simpler than that.

$ brew install go --cross-compile-common
$ GOOS=linux go build myapp.go

Libraries for All The Things

Go has a large number of good libraries that come standard. These libraries include support for making HTTP clients and servers. There’s support for accessing databases (although the drivers themselves are not included). It includes support for parsing command line arguments, encoding and decoding JSON, for doing cryptography, and for using regular expressions. Basically it includes a lot of libraries that you need for creating applications whether it’s something you want to maintain forever or whether it’s a throw away app.

flag.BoolVar(&help, "h", false, "help")
resp, err := http.Get("http://example.com/")
var exampleResp MyJsonResponse
decoder := json.NewDecoder(resp.Body)
err = decoder.Decode(&exampleResp)

Concurrent Design and Parallel Execution

Goroutines allow a program to execute a function concurrently with other running code. Channels allow for different goroutines to communicate by passing messages to each other. Those two things together allow for a simple means of structuring code with a concurrent design.

ch := make(chan int)
go func() {
  for {
    val := <-ch
    fmt.Printf("Got an int: %v", val)
ch <- 1
ch <- 2

In addition to having easy mechanisms to implement a concurrent design, your program also needs to be able to do actual work in parallel. Go can run many different goroutines in parallel and gives you control over how many run at the same time with a simple function call.


Put The Pieces Together

Bringing together those libraries and a concurrent design allows us to easily create a program that meets our needs for testing these APIs.

This is a simple application that does GET requests to a specific URL. The program allows you to specify the URL, the number of requests to make, and the number to run concurrently. It uses many of the libraries I mentioned above for handling HTTP, for parsing command line arguments, for calcuating the duration of requests, etc. It also uses goroutines to allow for multiple simultaneous requests to be made while using a channel to communicate the results back to the main program.

package main
import (
var help bool
var count int
var concurrent int
var url string
var client *http.Client
func init() {
  client = &http.Client{}
  flag.BoolVar(&help, "h", false, "help")
  flag.IntVar(&count, "n", 1000, "number of requests")
  flag.IntVar(&concurrent, "c", runtime.NumCPU() + 1, "number of concurrent requests")
  flag.StringVar(&url, "u", "", "url")
func main() {
  if help {
  fmt.Printf("Concurrent: %v\n", concurrent)
  runtime.GOMAXPROCS(concurrent + 2)
  runChan := make(chan int, concurrent)
  resultChan := make(chan Result)
  var wg sync.WaitGroup
  success_cnt := 0
  failure_cnt := 0
  var durations []time.Duration
  var min_dur time.Duration
  var max_dur time.Duration
  // Run the stuff
  dur := duration(func() {
    // setup to handle responses
    go func() {
      for {
        r := <-resultChan
        durations = append(durations, r.Duration)
        min_dur = min(min_dur, r.Duration)
        max_dur = max(max_dur, r.Duration)
        // 200s and 300s are success in HTTP
        if r.StatusCode < 400 {
          success_cnt += 1
        } else {
          fmt.Printf("Error: %v; %v\n", r.StatusCode, r.ErrOrBody())
          failure_cnt += 1
    // setup to handle running requests
    go func() {
      for i:=0; i < count; i++ {
        go func() {
          resultChan <- Execute()
          runChan <- 1
    // tell N number of requests to run, but this limits the concurrency
    for i := 0; i < concurrent; i ++ {
      runChan <- 1
  fmt.Printf("Success: %v\nFailure: %v\n", success_cnt, failure_cnt)
  fmt.Printf("Min: %v\nMax: %v\n", min_dur, max_dur)
  fmt.Printf("Mean: %v\n", avg(durations))
  fmt.Printf("Elapsed time: %v\n", dur.Seconds())
func avg(durs []time.Duration) time.Duration {
  total := float64(0)
  for _, d := range durs {
    total += d.Seconds()
  return time.Duration((total / float64(len(durs))) * float64(time.Second))
func min(a time.Duration, b time.Duration) time.Duration {
  if a != 0 && a < b {
    return a
  return b
func max(a time.Duration, b time.Duration) time.Duration {
  if a > b {
    return a
  return b
func Execute() Result {
  var resp *http.Response
  var err error
  dur := duration(func() {
    resp, err = http.Get(url)
  if err != nil {
    return Result{dur, -1, err, ""}
  defer resp.Body.Close()
  var body string
  if b, err := ioutil.ReadAll(resp.Body); err == nil {
    body = string(b)
  } else {
    body = ""
  return Result{dur, resp.StatusCode, nil, body}
type Result struct {
  Duration time.Duration
  StatusCode int
  Err error
  Body string
func (r *Result) ErrOrBody() string {
  if nil != r.Err {
    return r.Err.Error()
  } else {
    return r.Body
func duration(f func()) time.Duration {
  start := time.Now()
  return time.Now().Sub(start)

The app we wrote started out a lot like this; easy and straightforward. As we needed to add more tests we stated refactoring out types to allow me to separate the core of the load testing and calculation of times from the actual requests run. Go provides function type aliases, higher order functions and a lot of other abstractions which make those refactorings quite elegant. But that’s for a different post…

3.5x Increase In Performance with a One-Line Change

Gather around my friends, I’d like to tell you about a cure for what ails you, be it sniffles, scurvy, stomach ailments, eye sight, nervousness, gout, pneumonia, cancer, heart ailments, tiredness or plum just sick of life… Yes, sir, a bottle of my elixir will fix whatever ails you!

You might see the title above and think I’m trying to sell you some snake oil. The truth is, I probably am. As with most performance claims, your mileage may vary and the devil will always be in the details.

Let’s Start with a bit of Background

I recently began working on a client’s Ruby on Rails application that needed to provision data into another system at runtime. The provisioning was done through synchronous HTTP REST calls performed during the most performance critical request flow in the application. The flow that made up 95% of the overall traffic that this application handled. The provisioning consisted of between 8 and 15 HTTP requests to an external application.

record scratching

Yes, you read that correctly. For one HTTP request to this application, in the flow that made up 95% of the traffic that this application was supposed to handle, the app made up to 15 HTTP requests to a second system. This is not an ideal design from a performance standpoint of course. The ultimate goal would be to eliminate or substantially reduce the number of calls through a coarse grain interface. But that requires changes in two applications, coordinated across multiple teams, which will take a while. We needed to find something to do in the short term to help with the performance issues to give us the breathing room to make more extensive changes.

The Good News

Luckily the HTTP Requests were already being made using the Faraday library. Faraday is an HTTP client library which provides a consistent interface over different HTTP implementations. By default it uses the standard Ruby Net:HTTP library. Faraday is configured like this:

conn = Faraday.new(:url => 'http://example.com') do |faraday| 
  faraday.request :url_encoded # form-encode POST params 
  faraday.response :logger # log requests to STDOUT 
  faraday.adapter Faraday.default_adapter # make requests with Net::HTTP 

Net:HTTP in Faraday will create a new HTTP connection to the server for each request that is made. If you’re only making one request or you’re making requests to different hosts, this is perfectly fine. In our case, this was an HTTPS connection and all were being made to the same host. So for each of those 15 requests Net:HTTP was opening a new socket, negotiating some TCP, and negotiating an SSL connection. So how does Faraday help in this case?

One of the adapters that Faraday supports is net-http-persistent which is a ruby library that supports persistent connections and HTTP Keep-Alive across multiple requests. HTTP Keep-Alive allows for an HTTP connection to be reused for multiple requests and avoids the TCP negotiation and SSL connection overhead. To use the net-http-persistent implementation all you have to do is to change your Faraday configuration to look like:

conn = Faraday.new(:url => 'http://example.com') do |faraday| 
  faraday.request :url_encoded # form-encode POST params 
  faraday.response :logger # log requests to STDOUT 
  faraday.adapter :net_http_persistent 

This simple change swaps out the HTTP implementation that is used to make the requests. In our case it reduced the average time to process a complete request (including the ~15 requests made using Faraday) under load from 8 seconds down to 2.3 seconds.

the crowd goes wild

OK, so technically you need to add a new Gem reference to your Gemfile to use net-http-persistent. So it’s not REALLY a One-Line Fix. I also hope you never have an interface so chatty that your application needs to make 15 calls to the same remote server to process one request. But if you do! Let me tell you my friend! Just a little drop of net-http-persistent is all you need to cure what ails you.


Faraday has some other benefits including supporting a Middleware concept for processing requests and responses that allows for code to be shared easily across different HTTP requests. So you can have common support for handling JSON or for error handling or logging for example. This is a nice architecture that allows you to easily process request data. So even if you don’t need it for its ability to switch out HTTP implementations, it’s still a nice library to use.

Fake Materialized Views

In a previous post, I discussed materialized views in Oracle. I wanted to share a relatively simple technique that can be used to create similar functionality in Oracle or another database.

Why Use Something Else?

Materialized Views are a feature available in Oracle. If you’re not using Oracle, then that’s reason enough on its own. Materialized Views also have limitations in terms of how they are refreshed. Refreshing is the concept of updating the contents of the view. This can be done as a fast refresh when the base tables that it relies on changes. Certain things like union or other more “complex” SQL prevents any sort of fast refreshing from going on. The other option for updating the materialized view is a periodic update. Every n minute you can have the view refreshed. This is fine in certain circumstances, but there are use cases where this will be a show stopper. So what do you do?

Yes, I ran into the refreshing problem and had to work with a DBA to figure out another way to accomplish the same thing.

Table, View, Trigger

A materialized view is an object in the database that is backed by a table behind the scenes. We often want to use them as a way to denormalize data for reporting or performance reasons. Knowing this, you can create a structure that is a directly a table instead. This table will be the structure that you query against. To keep the data up to date you can implement a trigger that responds to changes in the base table to keep the dernomalized table up-to-date.

The simplest way to do this is to create a regular database view of the data that you want. Why not use the view directly? When you query a view it re-runs the query against the base table. This can be an expensive operation and put a lot of load on your database. Once you have a view containing the data that you want, you can then “copy” the data into your table for day-to-day querying.


create or replace view V$MY_VIEW AS
select table1.v1, table2.v2, table2.v3
from Table as table1
inner join Child_Table table2
on table1.id=table2.parent_id;

create table MY_TABLE
value1 varchar2(32),
value2 varchar2(128),
value3 int;
insert into MY_TABLE select * from V$MY_VIEW;

create trigger UPDATE_MY_TABLE
after insert or update or delete on Table
delete from MY_TABLE;
insert into MY_TABLE select * from V$MY_VIEW;


This is a denormalization technique that you can use for some read performance benefits in certain scenarios. Of course you will likely suffer from write performance when you do this. This particular technique is likely only useful if there are relatively few rows in your dependent tables, otherwise the performance will probably degrade quickly.

Obviously this is a pretty simple example, but hopefully it would help give you some inspiration if you need a similar solution.

Oracle Materialized Views

So the existence of Materialized Views might not be news to the Oracle DBAs around the world, but I present topics from the perspective of a software developer. As software developers we often have to use databases with our applications. As a user of a database, I think it is very important that software developers know what is available to them to leverage when they build their applications. I don’t want developers to be afraid of the database.

(As an aside, I have not found Materialized Views in any database other than Oracle. If you know of any others that support this, please leave a comment.)

What Are Materialized Views?

Database Views offer a way to encapsulate a query in the database and present it to a caller in a way that it looks like a regular table. Every time you query a view, join a view to another table or take similar actions, the query that makes up the view is rerun against the database.

Materialized views are a similar concept to regular views but with one very interesting difference. Materialized views are backed by a real object in the database. When a materialized view is created the query is run and a table (or a table-like structure) is created in the database. Materialized views, like regular Views, can be read-only or can be configured as read-write as well. The other thing that you can configure is who that materialized view is refreshed. The interesting thing is that these materialized views can be asynchronously refreshed in the database when dependent tables are changed. The refresh rules can be everything from never to whenever a row is updated, deleted or inserted. There are a number of rules and limitations on the various refresh schemes, rules that are too complicated to address in a short article, which means having the help of a good DBA would likely be very helpful.

This denormalization can give you really good performance gains. Of course you can do this with a View as well, but the database has to do even less work in the case of a materialized view because all of the relationships, aggregations, etc are pre-calculated and the results stored in a database object.

How Do I Use Them in my Application?

Tying this back into software development, how do we make use of them in our applications? The good news is that this is really straight forward. Using tools like Hibernate, you can map to materialized views just like you can to real tables. You can query them using JDBC (or PHP or Ruby Active Record) the same as you would a regular table. What I’ve been finding is that Materialized Views can be a great tool for denormalization. You can maintain a nicely normalized database schema for you application, but use the Materialized Views to offer some denormalized views of the data.

Some Example of Materialized View Use Cases:

  • Pre-calculatd Aggregate Values (sum, max, min)
  • Flatten hierarchies
  • Giving pre-filtered data selections
  • Anything that involves a complicated or slow series of calculations or joins

In an application I’m currently working with I found it very helpful to map parent-child relationships of objects using Hibernate through a materialized view to flatten a hierarchy. (Using the many-to-many mapping where the relationship table was a Materialized View.)

Company -> Division -> Department -> Employee

To answer the question of who works for a given Company you would join all the way down through those tables. Now image you have just one logical Organization tables that is self-referential (i.e. the Division of a Company has a parent_id of another row in the Organization table). Those hierarchical queries can be complicated and expensive. But you don’t suffer from the expense of the query (or many queries) if you flatten the hierarchy and create a “denormalized” table that maps Employees to their Company. The flattened tables or calculation tables that you choose to use are, of course, driven by your applications need for the data.

Example Query

A pre-calculated value table:

CREATE materialized VIEW dept_salary AS
SELECT dept.id AS dept_id, SUM(emp.salary) AS total, avg(emp.salary) AS avg,
    MIN(emp.salary) AS MIN, MAX(emp.salary) AS MAX
FROM Department dept
    INNER JOIN Employee emp
        ON emp.dept_id=dept.id
GROUP BY dept.id

Some More Reading

Ask Tom “Materialized Views
Secrets of Materialized Views
Materialized Views for Hierarchy Expansion

Hibernate Query Translators

I’ve recently been doing some performance testing and tuning on an application. It makes use of Hibernate for the data access and ORM and Spring to configure and wire together everything. As I was looking at all of the configuration and came upon the fact that we were using the ClassicQueryTranslatorFactory. The job of the Query Translator is to turn HQL queries into SQL queries. The ClassicQueryTranslatorFactory is the version that was included in Hibernate 2. In Hibernate 3 they created a new Query Translator, the ASTQueryTranslatorFactory. This Query Translator makes use of Antlr which is a Java based parser generator in the vein of lex and yacc.

I switched out the the ClassicQueryTranslatorFactory and started to use the ASTQueryTranslatorFactory and saw an immediate boost in performance of about 15% for the application. I also noticed that fewer queries were being generated for the page loads for the application. Of course this application uses quite a bit of HQL, so if you do not make use of HQL extensively, then you might not see the same benefits.

I have yet to see any documentation or any other evidence to support the claim that the newer ASTQueryTranslatorFactory would offer better performance, but in my case it seems like it has. Has anyone else noticed this behavior?

Hibernate HQL And Performance

The Hibernate ORM tool give you the ability to write SQL-esque queries using HQL to do custom joining, filtering, etc. to pull Objects from your database. The documentation gives you a lot of examples of the things you can do, but I haven’t seen any caveats or warnings.

Database Performance

As far as database performance goes there are two major things to start with when you want to understand your database performance:

  • How many queries are run?
  • How expensive are the individual queries?

Not too earth shattering is it? Basically if you run fewer queries of the same cost you’re better off. Likewise, if you make the queries themselves cost less (by optimizing the queries themselves, creating the proper indexes, etc) then they will run faster. So of course the best is to do both. Identify you to run fewer, faster queries. (Yes, I’m still waiting on my Nobel prize.)

I’ll talk more about fewer queries later…

To make queries faster, you mostly are working in the database. You depend on good tools and good statistics. If the size and kind of data changes, you might have to redo this stuff.

To Optimize your database queries:

  1. Run some queries examining their execution plans
  2. Find some possible columns to index
  3. Create an index
  4. Re-run the queries and examine the execution plans again
  5. Keep it if it’s faster, get rid of it if it’s not
  6. Goto 1

Hibernate and Caches

Hibernate does one thing: It maps Objects to a Relational database. Hibernate is really pretty good at that mapping and can support all kinds of schemas. So you should be able to (relatively) easily map your objects to your schema.

Hibernate also has two potential caching schemes. What it calls Level-1 and Level-2 caching. Level-1 caching is done through the Hibernate session. As long as the Hibernate session is open, any object that you have loaded will be pulled from the session if you query for it again.

The Level-2 cache is a longer-running, more advanced caching scheme. It allows you to store objects across Hibernate sessions. You’re often discouraged against using Level-2 caching, but it is very nice for read-only objects that you don’t expect to change in the database (think of pre-defined type information and the like). Again, if you query or one of these objects using Hibernate, then you’ll get an object from the Level-2 cache.

Notice how the Level-1 and Level-2 cache prevent Hibernate from having to re-query the database for a lot of objects. This of course can be a huge performance benefit. Likewise, Hibernate supports Lazy Loading of collections, so if your object is related to a collection of other objects, Hibernate will wait to load them until you need them. Once they’ve been loaded though, they are in the Object graph, so accessing them a second time does not require another round-trip to the database.

All of this lazy loading and caching is about reducing the number of queries you need to run against the database. You can also tweak your Hibernate mapping files to implement things like batching (loading children of multiple parents in one query) to greatly reduce the number of queries that need to be run. You can also specify to pre-load a related object using a left join if you will always need the object and want to get both in the same query. Most of the decisions are dependent on your application and what you are doing, but they are very easy to play with in your configuration and see if they improve your application performance.

Why the hard time for HQL?

All of the Caching and tweaking you can do in your Hibernate mappings (or using Annotations) is totally wasted if you using HQL queries to load your objects.

If you specify a fetch=”join” in your mapping to do a left join and load a dependent object, that doesn’t get used when you use HQL to load the object, so you will be doing more queries than you need.

If you have natural mappings of parent/child relationships then the following code will only generate a single query to load the Person and a single query to get the Addresses.

Person p = session.get(Person.class, 1);

address = p.getAddresses();
address2 = p.getAddresses();

This code still only generates two queries:

Person p = session.createQuery("from Person where id=:id")
.setParameter("id", 1).uniqueResult();

address = p.getAddresses();
address = p.getAddresses();

But the following code generates twice as many queries to load the addresses.

Person p = session.createQuery("from Person where id=:id")
.setParameter("id", 1).uniqueResult();

address = session
.createQuery("from Addresses where person_id=:id")
.setParameter("id", 1).list();
address2 = session
.createQuery("from Addresses where person_id=:id")
.setParameter("id", 1).list();

Of course this is a totally contrived example, but if you’ve built out a large system with a Service Facade and DAOs these kinds of things can easily be hidden deep in the application where it would be hard to know whether a call would trigger a database call or not. So be very conscious of using HQL queries and the consequences of using them.

Hibernate rewards you for using natural relationships in your Objects. It rewards you with performance for building a POJO based Object Oriented system.

Hibernate HQL Rules

Rule #1: Don’t use HQL.
Rule #2: If you really need to use HQL, see Rule #1.
Rule #3: If you really, really need HQL and you know what you’re doing, then carefully use HQL.

Ok, so if I’m right about this, why is this not at the top of the HQL documentation? Don’t you think they should talk about this as a method of last resort?

Time to start reading POJOs in Action again.