3.5x Increase In Performance with a One-Line Change

Gather around my friends, I’d like to tell you about a cure for what ails you, be it sniffles, scurvy, stomach ailments, eye sight, nervousness, gout, pneumonia, cancer, heart ailments, tiredness or plum just sick of life… Yes, sir, a bottle of my elixir will fix whatever ails you!

You might see the title above and think I’m trying to sell you some snake oil. The truth is, I probably am. As with most performance claims, your mileage may vary and the devil will always be in the details.

Let’s Start with a bit of Background

I recently began working on a client’s Ruby on Rails application that needed to provision data into another system at runtime. The provisioning was done through synchronous HTTP REST calls performed during the most performance critical request flow in the application. The flow that made up 95% of the overall traffic that this application handled. The provisioning consisted of between 8 and 15 HTTP requests to an external application.

record scratching

Yes, you read that correctly. For one HTTP request to this application, in the flow that made up 95% of the traffic that this application was supposed to handle, the app made up to 15 HTTP requests to a second system. This is not an ideal design from a performance standpoint of course. The ultimate goal would be to eliminate or substantially reduce the number of calls through a coarse grain interface. But that requires changes in two applications, coordinated across multiple teams, which will take a while. We needed to find something to do in the short term to help with the performance issues to give us the breathing room to make more extensive changes.

The Good News

Luckily the HTTP Requests were already being made using the Faraday library. Faraday is an HTTP client library which provides a consistent interface over different HTTP implementations. By default it uses the standard Ruby Net:HTTP library. Faraday is configured like this:

conn = Faraday.new(:url => 'http://example.com') do |faraday| 
  faraday.request :url_encoded # form-encode POST params 
  faraday.response :logger # log requests to STDOUT 
  faraday.adapter Faraday.default_adapter # make requests with Net::HTTP 
end

Net:HTTP in Faraday will create a new HTTP connection to the server for each request that is made. If you’re only making one request or you’re making requests to different hosts, this is perfectly fine. In our case, this was an HTTPS connection and all were being made to the same host. So for each of those 15 requests Net:HTTP was opening a new socket, negotiating some TCP, and negotiating an SSL connection. So how does Faraday help in this case?

One of the adapters that Faraday supports is net-http-persistent which is a ruby library that supports persistent connections and HTTP Keep-Alive across multiple requests. HTTP Keep-Alive allows for an HTTP connection to be reused for multiple requests and avoids the TCP negotiation and SSL connection overhead. To use the net-http-persistent implementation all you have to do is to change your Faraday configuration to look like:

conn = Faraday.new(:url => 'http://example.com') do |faraday| 
  faraday.request :url_encoded # form-encode POST params 
  faraday.response :logger # log requests to STDOUT 
  faraday.adapter :net_http_persistent 
end

This simple change swaps out the HTTP implementation that is used to make the requests. In our case it reduced the average time to process a complete request (including the ~15 requests made using Faraday) under load from 8 seconds down to 2.3 seconds.

the crowd goes wild

OK, so technically you need to add a new Gem reference to your Gemfile to use net-http-persistent. So it’s not REALLY a One-Line Fix. I also hope you never have an interface so chatty that your application needs to make 15 calls to the same remote server to process one request. But if you do! Let me tell you my friend! Just a little drop of net-http-persistent is all you need to cure what ails you.

P.S.

Faraday has some other benefits including supporting a Middleware concept for processing requests and responses that allows for code to be shared easily across different HTTP requests. So you can have common support for handling JSON or for error handling or logging for example. This is a nice architecture that allows you to easily process request data. So even if you don’t need it for its ability to switch out HTTP implementations, it’s still a nice library to use.

NoSQL with MongoDB and Ruby Presentation

I presented at the Milwaukee Ruby User’s Group tonight on NoSQL using MongoDB and Ruby.

Code Snippets for the Presentation

Basic Operations

 
// insert data
db.factories.insert( { name: "Miller", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Lakefront", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Point", metro: { city: "Steven's Point", state: "WI" } } );
db.factories.insert( { name: "Pabst", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Blatz", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Coors", metro: { city: "Golden Springs", state: "CO" } } );
 
// simple queries
db.factories.find()
db.factories.findOne()
db.factories.find( { "metro.city" : "Milwaukee" } )
db.factories.find( { "metro.state": {$in : ["WI", "CO"] } } )
 
// update data
db.factories.update( { name: "Lakefront"}, { $set : { thebest : true } } );
db.factories.find()
 
// delete data
db.factories.remove({name:"Coors"})
db.factories.remove()

Ruby Example

require 'rubygems'
require 'mongo'
include Mongo
 
db   = Connection.new.db('sample-db')
coll = db.collection('factories')
 
coll.remove
 
coll.insert( { :name => "Miller",    :metro => { :city => "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Lakefront", :metro => { :city: "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Point",     :metro => { :city => "Steven's Point", :state => "WI" } } )
coll.insert( { :name => "Pabst",     :metro => { :city => "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Blatz",     :metro => { :city => "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Coors",     :metro => { :city => "Golden Springs", :state => "CO" } } )
 
puts "There are #{coll.count()} factories. Here they are:"
coll.find().each { |doc| puts doc.inspect }
coll.map_reduce("function () { emit(this.metro.city, this.name); }", "function (k, vals) { return vals.join(","); }").each { |r| puts r.inspect }

Map Reduce Example

db.factories.insert( { name: "Miller", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Lakefront", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Point", metro: { city: "Steven's Point", state: "WI" } } );
db.factories.insert( { name: "Pabst", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Blatz", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Coors", metro: { city: "Golden Springs", state: "CO" } } );
 
var fmap = function () {
    emit(this.metro.city, this.name);
}
var fred = function (k, vals) {
    return vals.join(",");
}
res = db.factories.mapReduce(fmap, fred)
db[res.result].find()
db[res.result].drop()

The Presentation

Download NoSQL with MongoDB and Ruby Slides

Thanks to Meghan at 10Gen for sending stickers and a copy of MongoDB: The Definitive Guide that I gave out as a door prize. I read the book quickly this weekend before the talk and found it quite good, so I recommend it if you want to get started with MongoDB.

Using Ruby Subversion Bindings to Create Repositories

Subversion has a default Web UI that is served up by Apache if you run Subversion that way. It is pretty boring and read-only. Then there are things like WebSVN that make it less boring, but still read-only. I got curious about what it would take to make something even less boring and NOT read-only. Why not allow me to create a new repository on the server? Why not allow me to create a new default directory structure for a new project in a repository all through a web interface?

Parent Path Setup

All of my repositories use the idea of the SVNParentPath. This makes Apache assume that every directory under a given path is an SVN Repository. That structure makes it easy to deal with multiple repositories and secure them with a single security scheme. Using that assumption then it is easier to write some code that will list existing repositories and create new ones in a known location.

SvnAdmin With Ruby Subversion Bindings

Subversion provides language bindings for a number of different languages (Java, Python, Perl, PHP and Ruby) in addition to the native C libraries. Using these bindings it becomes fairly easy to deal with Subversion. The only hiccup will be dealing with the apparent lack of documentation for the code. So be prepared to do some exploration, digging and reading of the code.

I chose to try this using Ruby because it was quick and easy and it was a language I was already familiar with.

First you need to know how to create a new repository and open an existing repository. Fortunately those are simple, one-line operations:

Svn::Repos.create(repos_path, {})
repos = Svn::Repos.open(repos_path)

There was nothing special (from what I could tell) that would allow you to determine if a repository already existed, so I just created a simple function using the Ruby File operations to determine if a directory already existed. This code would allow me to determine if I needed to create new repository or open an existing one:

def repository_exists?(repos_path)
   File.directory?(repos_path)
end

Now I have a repository open so I wanted to build a project structure using the default conventions I use for Subversions projects. My convention is to have a repository named after a client, the top-level directories are named for the client’s project and then each project has the standard trunk, branches and tags within that. Depending on the kind of work you do that convention may or may not make sense for you.

With that decided, I created the code to write that structure in a repository. The one thing I found is that interacting with the Subversion repository allowed you to do things within a transaction that would force all of the changes to be recorded as a single commit. I thought this was a good thing, so performed these operations as a transaction:

txn = repos.fs.transaction
 
# create the top-level, project based directory
txn.root.make_dir(project_name)
 
# create the trunk, tags and branches for the new project
%w(trunk tags branches).each do |dir|
  txn.root.make_dir("#{project_name}/#{dir}")
end
 
repos.commit(txn)

Finally I put all of those things together into a class. The class had the concept of being initialized to a base Parent Path so all of the operations would know to start from that location:

require "svn/repos"
 
class SvnAdmin
   def initialize(parent_path)
     @base_path = parent_path
   end
 
   # Provides a list of directory entries. path must be a directory.
   def ls(client_name, path="/", revision=nil)
     repos_path = File.join(@base_path, client_name)
     repos = ensure_repository(repos_path)
 
     entries = repos.fs.root(revision).dir_entries(path)
     entries.keys
   end
 
   def create(client_name, project_name)
     repos_path = File.join(@base_path, client_name)
     repos = ensure_repository(repos_path)
 
     txn = repos.fs.transaction
     txn.root.make_dir(project_name)
     %w(trunk tags branches).each do |dir|
       txn.root.make_dir("#{project_name}/#{dir}")
     end
 
     repos.commit(txn)
   end
 
   private
   def ensure_repository(repos_path)
     if ! repository_exists?(repos_path)
       Svn::Repos.create(repos_path, {})
     end
     repos = Svn::Repos.open(repos_path)
   end
 
   def repository_exists?(repos_path)
     File.directory?(repos_path)
   end
 
end

SvnAdmin from Rails

Now that I had some simple code to create new repositories or add a new project to an existing repository I decided to wrap it in a simple Rails application that would allow me to create repositories using a web-based interface.

To start with, I’m not going to use a database or any ActiveRecord classes in this project (which you might do if you wanted authentication or something else) so I disabled ActiveRecord in the config/environment.rb

config.frameworks -= [ :active_record ]

Then I created an ReposController to manage the Subversion repositories. This controller contains a couple of simple actions:

  1. An index action to list the existing repositories (directories)
  2. A new action to display a form to enter the client and project names
  3. A create action to use the SvnAdmin class to create a new repository and/or project
require "svnadmin"
 
class ReposController < ApplicationController
  layout 'default'
 
  def index
    @dirs = Dir.new(SVN_PARENT_PATH).entries.sort.reject {|p| p == "." or p == ".."}
  end
 
  def new
  end
 
  def create
    svn = SvnAdmin.new(SVN_PARENT_PATH)
    repos = params[:repos]
 
    respond_to do |format|
 
      begin
        svn.create(repos[:client_name], repos[:project_name])
        flash[:notice] = "Successfully created."
        format.html { redirect_to :action => "index" }
        format.xml  { head :ok }
      rescue
        flash[:error] = "Failed to create structure."
        format.html { redirect_to :action => "index" }
        format.xml  { render :xml => object.errors, :status => :unprocessable_entity }
      end
    end
  end
end

You can also easily create a route and a ProjectsController that allows you to see all of the projects within a repository.

The route in config/routes.rb is simply:

  map.connect ':repos/projects/',
      :controller => 'projects',
      :action => 'index'

And the ProjectsController looks up the :repos param to open the proper repository and list the top-level directories with it:

require "svnadmin"
 
class ProjectsController < ApplicationController
  layout 'default'
 
  def index
    repos_path = params[:repos]
    svn = SvnAdmin.new(SVN_PARENT_PATH)
    @projects = svn.ls(repos_path)
  end
end

Hopefully that will help you handle your Subversion administration. It should let you code up your conventions so that they are followed whenever a new repository is created or a new project is started.