3.5x Increase In Performance with a One-Line Change

Gather around my friends, I’d like to tell you about a cure for what ails you, be it sniffles, scurvy, stomach ailments, eye sight, nervousness, gout, pneumonia, cancer, heart ailments, tiredness or plum just sick of life… Yes, sir, a bottle of my elixir will fix whatever ails you!

You might see the title above and think I’m trying to sell you some snake oil. The truth is, I probably am. As with most performance claims, your mileage may vary and the devil will always be in the details.

Let’s Start with a bit of Background

I recently began working on a client’s Ruby on Rails application that needed to provision data into another system at runtime. The provisioning was done through synchronous HTTP REST calls performed during the most performance critical request flow in the application. The flow that made up 95% of the overall traffic that this application handled. The provisioning consisted of between 8 and 15 HTTP requests to an external application.

record scratching

Yes, you read that correctly. For one HTTP request to this application, in the flow that made up 95% of the traffic that this application was supposed to handle, the app made up to 15 HTTP requests to a second system. This is not an ideal design from a performance standpoint of course. The ultimate goal would be to eliminate or substantially reduce the number of calls through a coarse grain interface. But that requires changes in two applications, coordinated across multiple teams, which will take a while. We needed to find something to do in the short term to help with the performance issues to give us the breathing room to make more extensive changes.

The Good News

Luckily the HTTP Requests were already being made using the Faraday library. Faraday is an HTTP client library which provides a consistent interface over different HTTP implementations. By default it uses the standard Ruby Net:HTTP library. Faraday is configured like this:


conn = Faraday.new(:url => 'http://example.com') do |faraday|
faraday.request :url_encoded # form-encode POST params
faraday.response :logger # log requests to STDOUT
faraday.adapter Faraday.default_adapter # make requests with Net::HTTP
end

Net:HTTP in Faraday will create a new HTTP connection to the server for each request that is made. If you’re only making one request or you’re making requests to different hosts, this is perfectly fine. In our case, this was an HTTPS connection and all were being made to the same host. So for each of those 15 requests Net:HTTP was opening a new socket, negotiating some TCP, and negotiating an SSL connection. So how does Faraday help in this case?

One of the adapters that Faraday supports is net-http-persistent which is a ruby library that supports persistent connections and HTTP Keep-Alive across multiple requests. HTTP Keep-Alive allows for an HTTP connection to be reused for multiple requests and avoids the TCP negotiation and SSL connection overhead. To use the net-http-persistent implementation all you have to do is to change your Faraday configuration to look like:


conn = Faraday.new(:url => 'http://example.com') do |faraday|
faraday.request :url_encoded # form-encode POST params
faraday.response :logger # log requests to STDOUT
faraday.adapter :net_http_persistent
end

This simple change swaps out the HTTP implementation that is used to make the requests. In our case it reduced the average time to process a complete request (including the ~15 requests made using Faraday) under load from 8 seconds down to 2.3 seconds.

the crowd goes wild

OK, so technically you need to add a new Gem reference to your Gemfile to use net-http-persistent. So it’s not REALLY a One-Line Fix. I also hope you never have an interface so chatty that your application needs to make 15 calls to the same remote server to process one request. But if you do! Let me tell you my friend! Just a little drop of net-http-persistent is all you need to cure what ails you.

P.S.

Faraday has some other benefits including supporting a Middleware concept for processing requests and responses that allows for code to be shared easily across different HTTP requests. So you can have common support for handling JSON or for error handling or logging for example. This is a nice architecture that allows you to easily process request data. So even if you don’t need it for its ability to switch out HTTP implementations, it’s still a nice library to use.

NoSQL with MongoDB and Ruby Presentation

I presented at the Milwaukee Ruby User’s Group tonight on NoSQL using MongoDB and Ruby.

Code Snippets for the Presentation

Basic Operations

// insert data
db.factories.insert( { name: "Miller", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Lakefront", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Point", metro: { city: "Steven's Point", state: "WI" } } );
db.factories.insert( { name: "Pabst", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Blatz", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Coors", metro: { city: "Golden Springs", state: "CO" } } );

// simple queries
db.factories.find()
db.factories.findOne()
db.factories.find( { "metro.city" : "Milwaukee" } )
db.factories.find( { "metro.state": {$in : ["WI", "CO"] } } )

// update data
db.factories.update( { name: "Lakefront"}, { $set : { thebest : true } } );
db.factories.find()

// delete data
db.factories.remove({name:"Coors"})
db.factories.remove()

Ruby Example


require 'rubygems'
require 'mongo'
include Mongo

db = Connection.new.db('sample-db')
coll = db.collection('factories')

coll.remove

coll.insert( { :name => "Miller", :metro => { :city => "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Lakefront", :metro => { :city: "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Point", :metro => { :city => "Steven's Point", :state => "WI" } } )
coll.insert( { :name => "Pabst", :metro => { :city => "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Blatz", :metro => { :city => "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Coors", :metro => { :city => "Golden Springs", :state => "CO" } } )

puts "There are #{coll.count()} factories. Here they are:"
coll.find().each { |doc| puts doc.inspect }
coll.map_reduce("function () { emit(this.metro.city, this.name); }", "function (k, vals) { return vals.join(","); }").each { |r| puts r.inspect }

Map Reduce Example


db.factories.insert( { name: "Miller", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Lakefront", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Point", metro: { city: "Steven's Point", state: "WI" } } );
db.factories.insert( { name: "Pabst", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Blatz", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Coors", metro: { city: "Golden Springs", state: "CO" } } );

var fmap = function () {
emit(this.metro.city, this.name);
}
var fred = function (k, vals) {
return vals.join(",");
}
res = db.factories.mapReduce(fmap, fred)
db[res.result].find()
db[res.result].drop()

The Presentation

Download NoSQL with MongoDB and Ruby Slides

Thanks to Meghan at 10Gen for sending stickers and a copy of MongoDB: The Definitive Guide that I gave out as a door prize. I read the book quickly this weekend before the talk and found it quite good, so I recommend it if you want to get started with MongoDB.

Capistrano Deploy with Git and Passenger

One of the great things about Rails and its community is that they are very lazy. Lazy in the good way of not wanting to do boring, repetitive, error prone things manually. They metaprogram and they automate. A great example of this is Capistrano. Capistrano allows you to deploy Rails applications with ease. The normal scenario is baked into Capistrano as a deployment convention and then you can customize it if you need to.

My Story

I’ve recently redeployed a couple of Ruby on Rails sites using Passenger (mod_rails). Passenger is an Apache module that really simplifies the deployment of small-scale Rails applications. Once Passenger is installed and your Rails application is set up as a virtual directory, it just works. Passenger auto-detects the fact that the directory is a Rails application and runs it for you. No need for a mongrel cluster or manually configuring load balancing.

I’m also using Git as my version control system on small, personal projects because it’s easy to use and I can work on multiple laptops and commit locally and worry about pushing to a central location when I have a network connection.

Seeing those things, I wanted to make them all work with Capistrano so that I could continue being lazy. To do this, I’m using Capistrano v2.4. It has Git support built in that works (some previous versions had support for Git, but seemed to have a lot of trouble).

Git Setup

By convention, Capistrano uses Subversion. So, I need to change my configuration to use git. The set :scm, :git does this. The repository information sets up where my git repository lives. In this case, I’m just using a bare git repository accessing it over SSH. You can also access your repository using the git and http protocols if you have that setup. The branch just says to deploy off of the master branch.

That’s pretty much it – nice and easy.

set :scm, :git
set :repository, "geoff@zorched.net:myapp.git"
set :branch, "master"
set :deploy_via, :remote_cache

Passenger (mod_rails) Setup

The only thing that comes into play with Passenger is restarting the Rails application after a deployment is done. Passenger has an easy way to do this which is just to create a file called restart.txt in the Rails tmp directory. When it sees that, the Rails application process will be recycled automatically.

Doing this requires just a bit of Capistrano customization. We need to override the deploy:restart task and have it run a small shell script for us. In this case we are running run “touch #{current_path}/tmp/restart.txt” to accomplish this task.


namespace :deploy do
desc "Restarting mod_rails with restart.txt"
task :restart, :roles => :app, :except => { :no_release => true } do
run "touch #{current_path}/tmp/restart.txt"
end

We can also override the start and stop tasks because those don’t really do anything in the mod_rails scenario like they would with mongrel or other deployments.

[:start, :stop].each do |t|
desc "#{t} task is a no-op with mod_rails"
task t, :roles => :app do ; end
end
end

The Whole Thing

Putting everything together in my deploy.rb looks like the following:

set :application, "enotify"

# If you aren't deploying to /u/apps/#{application} on the target
# servers (which is the default), you can specify the actual location
# via the :deploy_to variable:
set :deploy_to, "/var/www/myapp"

# If you aren't using Subversion to manage your source code, specify
# your SCM below:
set :scm, :git
set :repository, "geoff@zorched.net:myapp.git"
set :branch, "master"
set :deploy_via, :remote_cache

set :user, 'geoff'
set :ssh_options, { :forward_agent => true }

role :app, "zorched.net"
role :web, "zorched.net"
role :db, "zorched.net", :primary => true

namespace :deploy do
desc "Restarting mod_rails with restart.txt"
task :restart, :roles => :app, :except => { :no_release => true } do
run "touch #{current_path}/tmp/restart.txt"
end

[:start, :stop].each do |t|
desc "#{t} task is a no-op with mod_rails"
task t, :roles => :app do ; end
end
end

Mongrel Cluster and Apache Need Memory

I use a VPS hosted by SliceHost as my personal server. SliceHost uses Xen to host multiple instances of Linux on a single machine. The performance of this setup has been very good.

I have been running:

  • Apache 2.2 with PHP
  • MySQL 5
  • Postfix Mail Server
  • Courier IMAP Server
  • ssh for remote access of course

I recently started playing with a site built using Radiant CMS which is itself built on top of Ruby on Rails. So, I’ve added to the mix:

  • 3 Mongrel instances running under mongrel_cluster

These mongrel instances are proxied behind Apache using mod_proxy_balance as described here. This setup works very well and is more and more becoming the defacto standard for deploying Rails applications. Even the Ruby on Rails sites are deployed with this setup now. It allows you to serve all of your dynamic content through Rails and all of your static content through Apache. This gives you all of the speed and robustness that Apache has to offer (afterall it runs over 50% of all the hosts on the internet) for serving static content without burdening Mongrel with this task.

I was noticing that the site was pretty slow though. I tracked it down to the fact that I had started using too much memory. I was running the site on a VPS with 256M of RAM, but with the new Mongrel instances I had just pushed my server into swap space. Web applications in general are happier with more RAM. In this case it is definitely born out. I upped the VPS to have 512M of RAM and things became VERY SNAPPY! While I didn’t do a scientific before and after. The page loads prior to the upgrade were taking about 5-10s. After the memory increase you can’t tell if the application is static or dynamic.

So, if you’re running into performance issues with Mongrel behind an Apache mod_proxy_balance setup, check your memory. If you are running into swap space then you are likely to see serious performance issues. Let me know of any other simple tweaks to get more performance out of this setup if you have them.

As an aside:
Big kudos to SliceHost on their VPS upgrade capabilities. I clicked 2 buttons on my web-based management console and about 10 minutes later I was running on a bigger VPS. You can’t ask for much better than that if you need to scale up a server!

Update:
I guess Lighttpd and Nginx do both support running PHP applications under fast_cgi. You might want to try this kind of setup if you are so inclined. I’m still an Apache partisan.

Active Directory Authentication for Ruby on Rails

Ruby on Rails can be used to build many kinds of web applications including public internet applications as well as private intranet ones. As an intranet application it is often very interesting to be able to do Single Sign-On using an existing Active Directory setup. Rails does not support NTLM authentication out of the box which is what is required.

IIS for NTLM

If you are talking about Active Directory authentication then chances are good that you already have a Windows infrastructure. IIS, of course, supports NTLM so that’s the first thing I looked into. To use this you have to run Rails under something like FastCGI. To make a long story short, I could not get FastCGI to work with my Rails installation. This looks like a promising path for people who have a mostly Microsoft infrastructure already. This is an area that I hope to explore further, but I gave up on it for now.

If you want to try this route, check out RoR IIS which has a lot of instructions as well as an installer that can do a lot for you. (Again, I tried this first and it didn’t work, so your mileage my vary.)

Apache with NTLM

Authentication using Active Directory can be done with Apache on Windows as well using the mod_auth_sspi authentication module, so this seemed like another promising path as Apache and Rails is a more common combination as opposed to IIS and Rails.

Running Rails under Apache can be a bit tricky. There are a lot of options to choose from: mod_ruby, fast_cgi, proxying and scgi. All of these options with no real breakdown that I could find of why use one over another? Proxying with multiple mongrel instances is a very common combination, but I did not want to have mongrel running because I don’t want a way to end-run around the authentication mechanism. So, I again tried FastCGI, this time under Apache. Me and FastCGI don’t seem to get along though and it again failed to work.

InstantRails uses SCGI though and as a reference implementation for Rails and Apache on Windows, I figured that was a promising path.. SCGI is supposed to be a simpler form of CGI with all of the performance advantages of FastCGI. SCGI is a two part solution. There is an Apache module and there is an SCGI server that runs the rails application. Using this combination, I was able to get Rails running under Apache on Windows.

Under Apache the setup is fairly simple:

LoadModule scgi_module modules/mod_scgi.so
SCGIMount / 127.0.0.1:9999

Rails SCGI is well documented information including configuring Apache. So, I won’t repeat everything.

Configure Apache for NTLM Authentication

To use mod_auth_sspi, you have to load the module and then configure your application to use the domain to authenticate users. Once that is in place, your users will be authenticating using Active Directory. If you use a browser like Firefox or Safari, the user will see a Login Prompt, but if you are using Internet Explorer it will automatically pass the user’s current Active Directory credentials to Apache and mod_auth_sspi will do the authentication transparently.


LoadModule sspi_auth_module modules/mod_auth_sspi.so


AuthType SSPI
SSPIAuth On
SSPIAuthoritative On
SSPIDomain DOMAIN
SSPIOfferBasic On
SSPIOmitDomain Off
Require valid-user

Tying it Together In Rails

At this point, anyone who gets access to your application has been authenticated. If your application does not need to know who the user is, congratulations! You’re done. But if your application needs to actually know they who is using the app and not just that it’s a valid user then you have a little bit more work to do.

The most common way to check that a user is authenticated in Rails is to use a before_filter in your controllers to check that the session is properly setup and to send the user to a Login controller if they are not authenticated. You’ll need do the same thing in this case:

class ApplicationController < ActionController::Base protected def authenticate unless session[:user] redirect_to :controller => "login"
return false
else
# TODO: Check that the current user is the same as the session user
# TODO: Check user against active directory every time?
# request.env["REMOTE_USER"]
end
end
end

Most LoginContollers will show the user a form that allows them to authenticate. A User object is then used to authenticate that the username and password are correct. This implementation will be slightly different though. When Authentication is done by Apache it will set an HTTP variable request.env[“REMOTE_USER”] that will be available in your Controllers to identify who the authenticated User is. You can use this information to learn about the user instead of showing them a login form. Remember, the user has already been authenticated by the Active Directory domain, so the user’s credentials have been checked.


class LoginController < ApplicationController def index success = false user = User.find_by_name(request.env["REMOTE_USER"]); if user session[:user] = user success = true end redirect_to request.env["HTTP_REFERER"] ? :back : '/' if success redirect_to :action => :error if ! success
end

def logout
session[:user] = nil
end

def error
end
end

In the login controller you can get the REMOTE_USER information and then do whatever you need to do with that information. Most likely you’ll want to call into Active Directory using something like ruby-ldap to check on things like groups or role membership and to get extra User information.

Conclusion

There are a lot more implementation pieces that need to be completed. I would like to be able to pull extra information about a user from Active Directory the first time they authenticate and store it in the session so I could know their full name for example.

Hopefully this information will help someone else out because it definitely is not an obvious thing to do (at least it wasn’t to me).

Making Session Data Available to Models in Ruby on Rails

Ruby on Rails is implemented as the Model View Controller (MVC) pattern. This pattern separates the context of the Web Application (in the Controller and the View) from the core Model of the application. The Model contains the Domain objects which encapsulate business logic, data retrieval, etc. The View displays information to the user and allows them to provide input to the application. The Controller handles the interactions between the View and the Model.

This separation is a very good design principle that generally helps prevent spaghetti code. Sometimes though the separation might break down.


The following is really an alternative to using the ActionController::Caching::Sweeper which is a hybrid Model/Controller scoped Observer really. It seems to me, based on the name, that the intent is much more specific than giving Observers access to session data. Which do you prefer?

Rails provides the concept of a Model Observer. This Observer allows you to write code that will respond to the lifecycle events of the Model objects. For example you could log information every time a specific kind of Model object is saved. For example you could record some information every time an Account changed using the following Observer:

class AccountObserver < ActiveRecord::Observer def after_update(record) Audit.audit_change(record.account_id, record.new_balance) end end

You might have noticed a limitation with the previous API though. You didn't notice? The only information passed to the Observer is the Object that is being changed. What if you want more context than this? For example, what if you want to audit not only the values that changed them, but the user who made the change?

class AccountObserver < ActiveRecord::Observer def after_update(record) Audit.audit_change(current_user, record.account_id, record.new_balance) end end

How do you get the current_user value? Well, you have to plan ahead a little bit. The User in this application is stored in the HTTP Session when the user is authenticated. The session isn't directly available to the Model level (including the Observers) so you have to figure out a way around this. One way to accomplish this is by using a named Thread local variable. Using Mongrel as a web server, each HTTP request is served by its own thread. That means that a variable stored as thread local will be available for the entire processing of a request.

The UserInfo module encapsulates reading and writing the User object from/to the Thread local. This module can then be mixed in with other objects for easy access.

module UserInfo
def current_user
Thread.current[:user]
end

def self.current_user=(user)
Thread.current[:user] = user
end
end

A before_filter set in the ApplicationController will be called before any action is called in any controller. You can take advantage of this to copy a value out of the HTTP session and set it in the Thread local:


class ApplicationController < ActionController::Base include UserInfo # Pick a unique cookie name to distinguish our session data from others' session :session_key => '_app_session_id'

before_filter :set_user

protected
def authenticate
unless session[:user]
redirect_to :controller => "login"
return false
end
end

# Sets the current user into a named Thread location so that it can be accessed
# by models and observers
def set_user
UserInfo.current_user = session[:user]
end
end

At any point in an Observer of a Model class that you need to have access to those values you can just mixin the helper module and then use its methods to access the data. In this final example we mixin the UserInfo module to our AccountObserver and it will now have access to the current_user method:

class AccountObserver < ActiveRecord::Observer include UserInfo def after_update(record) Audit.audit_change(current_user, record.account_id, record.new_balance) end end

You generally shouldn't need this kind of trick outside of an Observer. In most cases the Controller should pass all of the information needed by a Model object to it through its methods. That will allow the Model objects to interact and the Controller to do the orchestration needed. But in a few special cases, this trick might be handy.

RJS Templates for Rails

I recently got a free copy of RJS Templates for Rails from the Milwaukee Ruby User’s Group. O’Reilly has a program that makes books available for free to Users Groups, which is a really nice thing (of course they bank on the word-of-mouth advertising that comes from it. Hi O’Reilly! ).

RJS Templates for Rails is an O’Reilly “Short Cut” which is basically a small book released as a PDF only. It offers a brief look into some of the things that you can do with RJS templates that were added in Ruby on Rails 1.1. RJS is a way to write client-side JavaScript using the Ruby language and JavaScriptGenerator API. This allows you to do complex, multi-step processing to modify different parts of the page in a single request.

The JavaScriptGenerator API is accessible from within a Controller and from RJS Templates. Which you use is part style and part pragmatic. For one-liners it is often just easier to write the code inline, reserving the RJS templates for more complex scripts.

Inline code would look like:

def hide_details
render :update do |page|
page[:details].hide
end
end

Where an RJS template is automatically wrapped with the render :update block.

page.insert_html :bottom, 'list', '

  • New Item At Bottom
  • '

    Both of these are simple examples of course, but they give you an idea of what’s going on. Each of these would generate JavaScript and send it back to the client where it would be executed in the browser. You can see that you do not have to write any JavaScript to use these techniques. The only thing that you have to do is to include the Prototype library in your pages using:

    <%= javascript_include_tag :defaults %>

    RJS Templates for Rails is a really quick read that does a good job of getting you started with using RJS and the JavaScriptGenerator API. It starts with a very simple example program that you can have up and running in a matter of minutes. It continues with a slightly more in-depth example of an application that really uses some of the more interesting aspects of RJS including multi-element page updates with a single request.

    The book offers a reference to the API that gives you a quick reference to the methods that you can use to program JavaScript in Ruby. Being a short book, it does not offer a slew of recipes that you would want from a more comprehensive book. This feels like an appetizer to me, giving me a taste of what’s available, whetting my appetite for more.

    In the abstract, RJS Templates for Rails actually offers some really good insight into how some people are doing AJAX with code generation techniques. Instead of writing JavaScript, they are creating APIs in their programming language of choice and then using code generation techniques to output the client-side code. While this style requires that the toolkit creator knows a great deal about the language being generated (JavaScript), it can encapsulate that knowledge so that others can leverage it without getting into the details. This is how the Google Web Toolkit (GWT) works as well for the Java camp. Prior to reading this book I don’t think I quite got what the big deal was, but now I see these frameworks really can do some incredible things for people who don’t want to “get their hands dirty” doing client-side development.

    If you’re an AJAX whiz, I’d doubt that this would offer a lot unless you just wanted to see another implementation and didn’t know a lot about Rails. But if you want to see how easy it could be to add some really interesting AJAX features to a Rails application, this can help you get started. You’ll almost definitely want to get a more comprehensive book to give you some more ideas on how to handle a broader range of situations.

    RJS Templates for Rails – Cody Fauser – ISBN: 0-596-52809-4

    P.S.
    The October 2006 issue of MacTech has a really good article on RJS as well. It covers some of the same basics that the PDF covers, using a few different examples.

    Rails on Rails

    If Ruby on Rails is all about optimizing the development of web applications, then what would happen if you optimized Rails?

    Well, it would get streamlined. Streamlined is a generator for Rails that can be used to easily create sites that are much better looking and much easier to use than the standard scaffold generated views. Even if you wouldn’t want to use it to create the full site, it might be useful to generate an administrative section that would have a much more limited audience.

    The other interesting thing about Streamlined is that it does for views what Rails already has for the Model objects. In Rails Model objects, you can use a declarative syntax to decorate the object to define relationships and to override behavior. With Streamlined you can do the same thing with views. You don’t have to edit HTML or rHTML templates, you can just decorate some generated classes to override the default behavior.

    In the spirt of Rails, there’s a great screencast available at the site showing off some of these really interesting features. Check it out…

    (Thanks to the Milwaukee Ruby User’s Group for showing the screencast.)