3.5x Increase In Performance with a One-Line Change

Gather around my friends, I’d like to tell you about a cure for what ails you, be it sniffles, scurvy, stomach ailments, eye sight, nervousness, gout, pneumonia, cancer, heart ailments, tiredness or plum just sick of life… Yes, sir, a bottle of my elixir will fix whatever ails you!

You might see the title above and think I’m trying to sell you some snake oil. The truth is, I probably am. As with most performance claims, your mileage may vary and the devil will always be in the details.

Let’s Start with a bit of Background

I recently began working on a client’s Ruby on Rails application that needed to provision data into another system at runtime. The provisioning was done through synchronous HTTP REST calls performed during the most performance critical request flow in the application. The flow that made up 95% of the overall traffic that this application handled. The provisioning consisted of between 8 and 15 HTTP requests to an external application.

record scratching

Yes, you read that correctly. For one HTTP request to this application, in the flow that made up 95% of the traffic that this application was supposed to handle, the app made up to 15 HTTP requests to a second system. This is not an ideal design from a performance standpoint of course. The ultimate goal would be to eliminate or substantially reduce the number of calls through a coarse grain interface. But that requires changes in two applications, coordinated across multiple teams, which will take a while. We needed to find something to do in the short term to help with the performance issues to give us the breathing room to make more extensive changes.

The Good News

Luckily the HTTP Requests were already being made using the Faraday library. Faraday is an HTTP client library which provides a consistent interface over different HTTP implementations. By default it uses the standard Ruby Net:HTTP library. Faraday is configured like this:


conn = Faraday.new(:url => 'http://example.com') do |faraday|
faraday.request :url_encoded # form-encode POST params
faraday.response :logger # log requests to STDOUT
faraday.adapter Faraday.default_adapter # make requests with Net::HTTP
end

Net:HTTP in Faraday will create a new HTTP connection to the server for each request that is made. If you’re only making one request or you’re making requests to different hosts, this is perfectly fine. In our case, this was an HTTPS connection and all were being made to the same host. So for each of those 15 requests Net:HTTP was opening a new socket, negotiating some TCP, and negotiating an SSL connection. So how does Faraday help in this case?

One of the adapters that Faraday supports is net-http-persistent which is a ruby library that supports persistent connections and HTTP Keep-Alive across multiple requests. HTTP Keep-Alive allows for an HTTP connection to be reused for multiple requests and avoids the TCP negotiation and SSL connection overhead. To use the net-http-persistent implementation all you have to do is to change your Faraday configuration to look like:


conn = Faraday.new(:url => 'http://example.com') do |faraday|
faraday.request :url_encoded # form-encode POST params
faraday.response :logger # log requests to STDOUT
faraday.adapter :net_http_persistent
end

This simple change swaps out the HTTP implementation that is used to make the requests. In our case it reduced the average time to process a complete request (including the ~15 requests made using Faraday) under load from 8 seconds down to 2.3 seconds.

the crowd goes wild

OK, so technically you need to add a new Gem reference to your Gemfile to use net-http-persistent. So it’s not REALLY a One-Line Fix. I also hope you never have an interface so chatty that your application needs to make 15 calls to the same remote server to process one request. But if you do! Let me tell you my friend! Just a little drop of net-http-persistent is all you need to cure what ails you.

P.S.

Faraday has some other benefits including supporting a Middleware concept for processing requests and responses that allows for code to be shared easily across different HTTP requests. So you can have common support for handling JSON or for error handling or logging for example. This is a nice architecture that allows you to easily process request data. So even if you don’t need it for its ability to switch out HTTP implementations, it’s still a nice library to use.

NoSQL with MongoDB and Ruby Presentation

I presented at the Milwaukee Ruby User’s Group tonight on NoSQL using MongoDB and Ruby.

Code Snippets for the Presentation

Basic Operations

// insert data
db.factories.insert( { name: "Miller", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Lakefront", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Point", metro: { city: "Steven's Point", state: "WI" } } );
db.factories.insert( { name: "Pabst", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Blatz", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Coors", metro: { city: "Golden Springs", state: "CO" } } );

// simple queries
db.factories.find()
db.factories.findOne()
db.factories.find( { "metro.city" : "Milwaukee" } )
db.factories.find( { "metro.state": {$in : ["WI", "CO"] } } )

// update data
db.factories.update( { name: "Lakefront"}, { $set : { thebest : true } } );
db.factories.find()

// delete data
db.factories.remove({name:"Coors"})
db.factories.remove()

Ruby Example


require 'rubygems'
require 'mongo'
include Mongo

db = Connection.new.db('sample-db')
coll = db.collection('factories')

coll.remove

coll.insert( { :name => "Miller", :metro => { :city => "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Lakefront", :metro => { :city: "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Point", :metro => { :city => "Steven's Point", :state => "WI" } } )
coll.insert( { :name => "Pabst", :metro => { :city => "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Blatz", :metro => { :city => "Milwaukee", :state => "WI" } } )
coll.insert( { :name => "Coors", :metro => { :city => "Golden Springs", :state => "CO" } } )

puts "There are #{coll.count()} factories. Here they are:"
coll.find().each { |doc| puts doc.inspect }
coll.map_reduce("function () { emit(this.metro.city, this.name); }", "function (k, vals) { return vals.join(","); }").each { |r| puts r.inspect }

Map Reduce Example


db.factories.insert( { name: "Miller", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Lakefront", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Point", metro: { city: "Steven's Point", state: "WI" } } );
db.factories.insert( { name: "Pabst", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Blatz", metro: { city: "Milwaukee", state: "WI" } } );
db.factories.insert( { name: "Coors", metro: { city: "Golden Springs", state: "CO" } } );

var fmap = function () {
emit(this.metro.city, this.name);
}
var fred = function (k, vals) {
return vals.join(",");
}
res = db.factories.mapReduce(fmap, fred)
db[res.result].find()
db[res.result].drop()

The Presentation

Download NoSQL with MongoDB and Ruby Slides

Thanks to Meghan at 10Gen for sending stickers and a copy of MongoDB: The Definitive Guide that I gave out as a door prize. I read the book quickly this weekend before the talk and found it quite good, so I recommend it if you want to get started with MongoDB.

Using Ruby Subversion Bindings to Create Repositories

Subversion has a default Web UI that is served up by Apache if you run Subversion that way. It is pretty boring and read-only. Then there are things like WebSVN that make it less boring, but still read-only. I got curious about what it would take to make something even less boring and NOT read-only. Why not allow me to create a new repository on the server? Why not allow me to create a new default directory structure for a new project in a repository all through a web interface?

Parent Path Setup

All of my repositories use the idea of the SVNParentPath. This makes Apache assume that every directory under a given path is an SVN Repository. That structure makes it easy to deal with multiple repositories and secure them with a single security scheme. Using that assumption then it is easier to write some code that will list existing repositories and create new ones in a known location.

SvnAdmin With Ruby Subversion Bindings

Subversion provides language bindings for a number of different languages (Java, Python, Perl, PHP and Ruby) in addition to the native C libraries. Using these bindings it becomes fairly easy to deal with Subversion. The only hiccup will be dealing with the apparent lack of documentation for the code. So be prepared to do some exploration, digging and reading of the code.

I chose to try this using Ruby because it was quick and easy and it was a language I was already familiar with.

First you need to know how to create a new repository and open an existing repository. Fortunately those are simple, one-line operations:

Svn::Repos.create(repos_path, {})
repos = Svn::Repos.open(repos_path)

There was nothing special (from what I could tell) that would allow you to determine if a repository already existed, so I just created a simple function using the Ruby File operations to determine if a directory already existed. This code would allow me to determine if I needed to create new repository or open an existing one:

def repository_exists?(repos_path)
File.directory?(repos_path)
end

Now I have a repository open so I wanted to build a project structure using the default conventions I use for Subversions projects. My convention is to have a repository named after a client, the top-level directories are named for the client’s project and then each project has the standard trunk, branches and tags within that. Depending on the kind of work you do that convention may or may not make sense for you.

With that decided, I created the code to write that structure in a repository. The one thing I found is that interacting with the Subversion repository allowed you to do things within a transaction that would force all of the changes to be recorded as a single commit. I thought this was a good thing, so performed these operations as a transaction:


txn = repos.fs.transaction

# create the top-level, project based directory
txn.root.make_dir(project_name)

# create the trunk, tags and branches for the new project
%w(trunk tags branches).each do |dir|
txn.root.make_dir("#{project_name}/#{dir}")
end

repos.commit(txn)

Finally I put all of those things together into a class. The class had the concept of being initialized to a base Parent Path so all of the operations would know to start from that location:

require "svn/repos"

class SvnAdmin
def initialize(parent_path)
@base_path = parent_path
end

# Provides a list of directory entries. path must be a directory.
def ls(client_name, path="/", revision=nil)
repos_path = File.join(@base_path, client_name)
repos = ensure_repository(repos_path)

entries = repos.fs.root(revision).dir_entries(path)
entries.keys
end

def create(client_name, project_name)
repos_path = File.join(@base_path, client_name)
repos = ensure_repository(repos_path)

txn = repos.fs.transaction
txn.root.make_dir(project_name)
%w(trunk tags branches).each do |dir|
txn.root.make_dir("#{project_name}/#{dir}")
end

repos.commit(txn)
end

private
def ensure_repository(repos_path)
if ! repository_exists?(repos_path)
Svn::Repos.create(repos_path, {})
end
repos = Svn::Repos.open(repos_path)
end

def repository_exists?(repos_path)
File.directory?(repos_path)
end

end

SvnAdmin from Rails

Now that I had some simple code to create new repositories or add a new project to an existing repository I decided to wrap it in a simple Rails application that would allow me to create repositories using a web-based interface.

To start with, I’m not going to use a database or any ActiveRecord classes in this project (which you might do if you wanted authentication or something else) so I disabled ActiveRecord in the config/environment.rb
config.frameworks -= [ :active_record ]

Then I created an ReposController to manage the Subversion repositories. This controller contains a couple of simple actions:

  1. An index action to list the existing repositories (directories)
  2. A new action to display a form to enter the client and project names
  3. A create action to use the SvnAdmin class to create a new repository and/or project


require "svnadmin"

class ReposController < ApplicationController layout 'default' def index @dirs = Dir.new(SVN_PARENT_PATH).entries.sort.reject {|p| p == "." or p == ".."} end def new end def create svn = SvnAdmin.new(SVN_PARENT_PATH) repos = params[:repos] respond_to do |format| begin svn.create(repos[:client_name], repos[:project_name]) flash[:notice] = "Successfully created." format.html { redirect_to :action => "index" }
format.xml { head :ok }
rescue
flash[:error] = "Failed to create structure."
format.html { redirect_to :action => "index" }
format.xml { render :xml => object.errors, :status => :unprocessable_entity }
end
end
end
end

You can also easily create a route and a ProjectsController that allows you to see all of the projects within a repository.

The route in config/routes.rb is simply:

map.connect ':repos/projects/',
:controller => 'projects',
:action => 'index'

And the ProjectsController looks up the :repos param to open the proper repository and list the top-level directories with it:

require "svnadmin"

class ProjectsController < ApplicationController layout 'default' def index repos_path = params[:repos] svn = SvnAdmin.new(SVN_PARENT_PATH) @projects = svn.ls(repos_path) end end

Hopefully that will help you handle your Subversion administration. It should let you code up your conventions so that they are followed whenever a new repository is created or a new project is started.

Capistrano and Ferret DRB

This is a bit of a followup to my previous post on Capistrano with Git and Passenger. I decided to use Ferret via the acts_as_ferret (AAF) plugin. Ferret is a full-text search inspired by Apache’s Lucene but written in Ruby.

Basically Ferret and Lucene keep a full-text index outside of the database that allows it to quickly perform full-text searches and find the identifers of rows in your database. Then you can go get those objects out of the database. It’s pretty slick.

Ferret uses DRb as a means of supporting multiple-concurrent clients and for scaling across multiple machines. You really don’t need to know much about DRb to use AAF, but you do need to run the ferret DRb server in your production environment. Which gets us to…

Automating The Starting and Stopping of ferret_server

A few lines of code in your Capistrano deploy.rb and you are off and running.

before "deploy:start" do
run "#{current_path}/script/ferret_server -e production start"
end

after "deploy:stop" do
run "#{current_path}/script/ferret_server -e production stop"
end

after 'deploy:restart' do
run "cd #{current_path} && ./script/ferret_server -e production stop"
run "cd #{current_path} && ./script/ferret_server -e production start"
end

Except it doesn’t work. I ended up with some errors like:
could not execute command
no such file to load — /usr/bin/../config/environment

It also ends up that it’s not Capistrano’s fault.

Acts As Ferret server_manager.rb

In the file vendor/plugins/acts_as_ferret/lib/server_manager.rb there is a line that sets up where to look for its environment information. For some reason this is the default:

# require(File.join(File.dirname(__FILE__), '../../../../config/environment'))
require(File.join(File.dirname(ENV['_']), '../config/environment'))

If you notice, there is a line commented out. It just so happens that uncommenting that line and commenting out the other fixed the issue for me. It ends up that ENV[‘_’] points to the base path of the executable and thats /usr/bin/env. And that doesn’t work. I’m not sure why that’s the default behavior.

Anyway, it’s easily fixed:

require(File.join(File.dirname(__FILE__), '../../../../config/environment'))
# require(File.join(File.dirname(ENV['_']), '../config/environment'))

With that fix in place, the Capistrano deployment will restart the Ferret DRb server when you deploy your application.

Update
According to John in the comments below you can fix the AAF problem without changing the code as well. Just add default_run_options[:shell] = false to your Capistrano script and that will take care of it.

Capistrano Deploy with Git and Passenger

One of the great things about Rails and its community is that they are very lazy. Lazy in the good way of not wanting to do boring, repetitive, error prone things manually. They metaprogram and they automate. A great example of this is Capistrano. Capistrano allows you to deploy Rails applications with ease. The normal scenario is baked into Capistrano as a deployment convention and then you can customize it if you need to.

My Story

I’ve recently redeployed a couple of Ruby on Rails sites using Passenger (mod_rails). Passenger is an Apache module that really simplifies the deployment of small-scale Rails applications. Once Passenger is installed and your Rails application is set up as a virtual directory, it just works. Passenger auto-detects the fact that the directory is a Rails application and runs it for you. No need for a mongrel cluster or manually configuring load balancing.

I’m also using Git as my version control system on small, personal projects because it’s easy to use and I can work on multiple laptops and commit locally and worry about pushing to a central location when I have a network connection.

Seeing those things, I wanted to make them all work with Capistrano so that I could continue being lazy. To do this, I’m using Capistrano v2.4. It has Git support built in that works (some previous versions had support for Git, but seemed to have a lot of trouble).

Git Setup

By convention, Capistrano uses Subversion. So, I need to change my configuration to use git. The set :scm, :git does this. The repository information sets up where my git repository lives. In this case, I’m just using a bare git repository accessing it over SSH. You can also access your repository using the git and http protocols if you have that setup. The branch just says to deploy off of the master branch.

That’s pretty much it – nice and easy.

set :scm, :git
set :repository, "geoff@zorched.net:myapp.git"
set :branch, "master"
set :deploy_via, :remote_cache

Passenger (mod_rails) Setup

The only thing that comes into play with Passenger is restarting the Rails application after a deployment is done. Passenger has an easy way to do this which is just to create a file called restart.txt in the Rails tmp directory. When it sees that, the Rails application process will be recycled automatically.

Doing this requires just a bit of Capistrano customization. We need to override the deploy:restart task and have it run a small shell script for us. In this case we are running run “touch #{current_path}/tmp/restart.txt” to accomplish this task.


namespace :deploy do
desc "Restarting mod_rails with restart.txt"
task :restart, :roles => :app, :except => { :no_release => true } do
run "touch #{current_path}/tmp/restart.txt"
end

We can also override the start and stop tasks because those don’t really do anything in the mod_rails scenario like they would with mongrel or other deployments.

[:start, :stop].each do |t|
desc "#{t} task is a no-op with mod_rails"
task t, :roles => :app do ; end
end
end

The Whole Thing

Putting everything together in my deploy.rb looks like the following:

set :application, "enotify"

# If you aren't deploying to /u/apps/#{application} on the target
# servers (which is the default), you can specify the actual location
# via the :deploy_to variable:
set :deploy_to, "/var/www/myapp"

# If you aren't using Subversion to manage your source code, specify
# your SCM below:
set :scm, :git
set :repository, "geoff@zorched.net:myapp.git"
set :branch, "master"
set :deploy_via, :remote_cache

set :user, 'geoff'
set :ssh_options, { :forward_agent => true }

role :app, "zorched.net"
role :web, "zorched.net"
role :db, "zorched.net", :primary => true

namespace :deploy do
desc "Restarting mod_rails with restart.txt"
task :restart, :roles => :app, :except => { :no_release => true } do
run "touch #{current_path}/tmp/restart.txt"
end

[:start, :stop].each do |t|
desc "#{t} task is a no-op with mod_rails"
task t, :roles => :app do ; end
end
end

Coffee DSL Redone With Meta-Programming

In a previous post I wrote about DSLs as Jargon. I implemented a simple Coffee DSL that would allow code to parse an order written by a human and turn it into a domain model. I used a fairly basic method_missing structure to capture the values.

There’s a much better way to do it in Ruby with meta-programming. Meta-programming allows you to write code to write code. You program your programming. In this case we can create the syntax of Coffee using a meta-programming technique.

dsl_attr :size, %w(venti grande tall)

This is us programming the class to say: “If someone calls a method venti, grande, or tall on our object they mean that they are telling us the size of the coffee, so store that value as the size”. So now we can write our Coffee class like this:

# CoffeeDSL.rb
# This is the input from the user, likely read from a file
# or input through a user interface of some sort
CoffeeInput = "venti nonfat whip latte"

class Coffee
dsl_attr :size, %w(venti grande tall)
dsl_attr :whipped, %w(whip nowhip)
dsl_attr :caffinated, %w(caf decaf halfcaf)
dsl_attr :type, %w(regular latte cappachino)
dsl_attr :milks, %w(milk nonfat soy)

def order
params = ''
params += milks + ' ' if milks?
params += caffinated + ' ' if caffinated?
params += whipped + ' ' if whipped?
print "Ordering coffee: #{size} #{params}#{type}\n"
end

def load
# turn one line into multi-line "method calls"
cleaned = CoffeeInput.gsub(/\s+/, "\n")
self.instance_eval(cleaned)
end
end

We are essentially configuring the class in code. We could add extra values as well, such as a default value, required validation, any number of things. We then just need to implement the dsl_attr using meta-programming. That can be done in the Module in Ruby which makes that available to all classes in the system.


class Module
def dsl_attr(param_name, values)
attr param_name
class_eval "def #{param_name}?; @#{param_name}; end"
values.each do |val|
define_method("#{val}") do
instance_eval %{
@#{param_name} = '#{val}'
}
end
end
end
end

Now when you run the code it captures all of the values that are parsed from the input and puts them into your object as meaningful values.

c = Coffee.new
c.load
c.order

I did the same DSL in Groovy and thought I could attempt to do it more justice using meta-programming as well. In Groovy, meta-programming is done with the ExpandoMetaClass – no, I didn’t make that up. Each Class has a metaClass property that gets you access to that types’ ExpandoMetaClass instance. You can then add properties and methods and whatnot to it. This has the effect of making the properties or methods callable on an instance of that type.


ExpandoMetaClass.enableGlobally() // have to do this to get inheritance of dslAttr

Object.metaClass.dslAttr << {String param_name, values ->
def clazz = delegate
clazz.metaClass."${param_name}" = null
values.each() { val ->
clazz.metaClass."${val}" << {-> clazz."${param_name}" = "${val}" }
}
}

class Coffee {
def Coffee() {
dslAttr("size", ['venti', 'tall', 'grande'])
dslAttr("whipped", ['whip', 'nowhip'])
dslAttr("caffinated", ['caf', 'decaf', 'halfcaf'])
dslAttr("type", ['regular', 'latte', 'cappachino'])
dslAttr("milks", ['milk', 'nonfat', 'soy'])
}

def order() {
def params = ''
if (null != getMilks()) params += "${getMilks()} "
if (null != getCaffinated()) params += "${getCaffinated()} "
if (null != getWhipped()) params += "${getWhipped()} "
println "Ordering coffee: ${getSize()} ${params}${getType()}\n"
}

def load(String input) {
// turn one line into multi-line "method calls"
def cleaned = input.split(/\s+/)
cleaned.each() { meth -> this.&"${meth}"() }
}
}

def c = new Coffee()
c.load("venti nonfat whip latte")
c.order()

I’m not sure if there is a better way to do this or not. Ideally I would like to have the dslAttr add something to the Coffee metaClass instead of just adding stuff to the instances, but this seems to do the trick for now.

The Ruby and Groovy implementations become fairly similar at this point. It’s a great way to reduce the amount of boilerplate code you would need to normally write to implement this kind of thing in less dynamic languages.

Mongrel Cluster and Apache Need Memory

I use a VPS hosted by SliceHost as my personal server. SliceHost uses Xen to host multiple instances of Linux on a single machine. The performance of this setup has been very good.

I have been running:

  • Apache 2.2 with PHP
  • MySQL 5
  • Postfix Mail Server
  • Courier IMAP Server
  • ssh for remote access of course

I recently started playing with a site built using Radiant CMS which is itself built on top of Ruby on Rails. So, I’ve added to the mix:

  • 3 Mongrel instances running under mongrel_cluster

These mongrel instances are proxied behind Apache using mod_proxy_balance as described here. This setup works very well and is more and more becoming the defacto standard for deploying Rails applications. Even the Ruby on Rails sites are deployed with this setup now. It allows you to serve all of your dynamic content through Rails and all of your static content through Apache. This gives you all of the speed and robustness that Apache has to offer (afterall it runs over 50% of all the hosts on the internet) for serving static content without burdening Mongrel with this task.

I was noticing that the site was pretty slow though. I tracked it down to the fact that I had started using too much memory. I was running the site on a VPS with 256M of RAM, but with the new Mongrel instances I had just pushed my server into swap space. Web applications in general are happier with more RAM. In this case it is definitely born out. I upped the VPS to have 512M of RAM and things became VERY SNAPPY! While I didn’t do a scientific before and after. The page loads prior to the upgrade were taking about 5-10s. After the memory increase you can’t tell if the application is static or dynamic.

So, if you’re running into performance issues with Mongrel behind an Apache mod_proxy_balance setup, check your memory. If you are running into swap space then you are likely to see serious performance issues. Let me know of any other simple tweaks to get more performance out of this setup if you have them.

As an aside:
Big kudos to SliceHost on their VPS upgrade capabilities. I clicked 2 buttons on my web-based management console and about 10 minutes later I was running on a bigger VPS. You can’t ask for much better than that if you need to scale up a server!

Update:
I guess Lighttpd and Nginx do both support running PHP applications under fast_cgi. You might want to try this kind of setup if you are so inclined. I’m still an Apache partisan.

Interact with REST Services from the Command Line

REST is becoming more popular as a means of implementing Service Oriented Architectures (SOA) as well as merely providing simple remote APIs for interacting with systems. The main reason for this is that it provides a very simple means of creating and consuming Services. Contrasted to SOA implementations like SOAP and REST can be a relief of simplicity.

One of the main advantages of REST is that it requires no tooling to use. Unlike SOAP, it is very easy to construct ad-hoc clients to consume a RESTful service. These examples use curl a command-line utility available on Unix systems or using Cygwin on Windows. The same concepts can be translated to anything that can send HTTP requests.

Example REST Service with Ruby on Rails

As the example implementation, I’ll use a Ruby on Rails controller. Rails has very good support for implementing RESTful services so is easy to show.

To get started with this example you can generate a Rails project and the Order object with the following commands:

rails order_example
cd order_example
./script generate resource order name:string

Then you can implement a RESTful controller with the following code:

class OrdersController < ApplicationController # GET /orders # GET /orders.xml def index @orders = Order.find(:all) respond_to do |format| format.html # index.rhtml format.xml { render :xml => @orders.to_xml }
end
end

# GET /orders/1
# GET /orders/1.xml
def show
@order = Order.find(params[:id])
respond_to do |format|
format.html # index.rhtml
format.xml { render :xml => @order.to_xml }
end
end

# POST /orders
# POST /orders.xml
def create
@order = Order.new(params[:order])
respond_to do |format|
if @order.save
flash[:notice] = 'Order was successfully created.'
format.html { redirect_to order_url(@order) }
format.xml { head :created, :location => order_url(@order) }
else
format.html { render :action => "new" }
format.xml { render :xml => @order.errors.to_xml }
end
end
end

# PUT /orders/1
# PUT /orders/1.xml
def update
@order = Order.find(params[:id])
respond_to do |format|
if @order.update_attributes(params[:order])
flash[:notice] = 'Order was successfully updated.'
format.html { redirect_to order_url(@order) }
format.xml { head :ok }
else
format.html { render :action => "edit" }
format.xml { render :xml => @order.errors.to_xml }
end
end
end

# DELETE /orders/1
# DELETE /orders/1.xml
def destroy
@order = Order.find(params[:id])
@order.destroy
respond_to do |format|
format.html { redirect_to orders_url }
format.xml { head :ok }
end
end
end

This controller allows you respond to all of the Actions that can be taken on a Resource: GET, POST, PUT and DELETE.

Command Line Interaction with the Service

Start our Rails application and then you can see the following commands at work.

./script/server

Get a list of all of the Orders

The first thing you want to do is get a list of all of the orders in the system. To do this we perform a GET command asking for an XML response. The URI in this case represents a list of all the Orders in the system.

curl -X GET -H 'Accept: application/xml' http://localhost:3000/orders

Get a single Order

If we want to get the XML representation of a single order then we can ask for a specific Order by changing the ID to a URI that represents just one Order.

curl -X GET -H 'Accept: application/xml' http://localhost:3000/orders/15

Delete an Order

REST keeps things simple by having consistent Resource URIs. The URI that represents Order number 15 can also be used to Delete or Modify that Order. In this case the URI for the GET is the same, but we ask it to delete the Order instead.

curl -X DELETE -H 'Accept: application/xml' http://localhost:3000/orders/15

Modify an existing Order

Just as with delete, if we want to modify an Order we use the URI that represents that specific Order. The only difference is that we have to tell the server that we are sending it XML, and then actually send the XML.

curl -i -X PUT -H 'Content-Type: application/xml' -H 'Accept: application/xml' \
-d 'Foo' http://localhost:3000/orders/15

Create a new Order

Creating an Order looks very similar to modifying an Order but the URI changes to the Resource URI for the collection of all Orders. The response to this command will be an HTTP 302 Redirect that gives you the URI of the newly created Order Resource.

curl -i -X POST -H 'Content-Type: application/xml' -H 'Accept: application/xml' \
-d 'Foo' http://localhost:3000/orders/

Conclusion

I think you can see how easily you can interact with a REST service using only the most basic tools available, namely simple Unix command line utilities. This simplicity offers a lot of power, flexibility and interoperability that you lose when you implement services with more complicated implementations such as SOAP. That’s not to say that SOAP and all of the WS-* specifications don’t have have their place, because they do. When you can implement a simple solution and meet your needs you will often find that solution to have a surprising amount of added benefits such as flexibility.

Active Directory Authentication for Ruby on Rails

Ruby on Rails can be used to build many kinds of web applications including public internet applications as well as private intranet ones. As an intranet application it is often very interesting to be able to do Single Sign-On using an existing Active Directory setup. Rails does not support NTLM authentication out of the box which is what is required.

IIS for NTLM

If you are talking about Active Directory authentication then chances are good that you already have a Windows infrastructure. IIS, of course, supports NTLM so that’s the first thing I looked into. To use this you have to run Rails under something like FastCGI. To make a long story short, I could not get FastCGI to work with my Rails installation. This looks like a promising path for people who have a mostly Microsoft infrastructure already. This is an area that I hope to explore further, but I gave up on it for now.

If you want to try this route, check out RoR IIS which has a lot of instructions as well as an installer that can do a lot for you. (Again, I tried this first and it didn’t work, so your mileage my vary.)

Apache with NTLM

Authentication using Active Directory can be done with Apache on Windows as well using the mod_auth_sspi authentication module, so this seemed like another promising path as Apache and Rails is a more common combination as opposed to IIS and Rails.

Running Rails under Apache can be a bit tricky. There are a lot of options to choose from: mod_ruby, fast_cgi, proxying and scgi. All of these options with no real breakdown that I could find of why use one over another? Proxying with multiple mongrel instances is a very common combination, but I did not want to have mongrel running because I don’t want a way to end-run around the authentication mechanism. So, I again tried FastCGI, this time under Apache. Me and FastCGI don’t seem to get along though and it again failed to work.

InstantRails uses SCGI though and as a reference implementation for Rails and Apache on Windows, I figured that was a promising path.. SCGI is supposed to be a simpler form of CGI with all of the performance advantages of FastCGI. SCGI is a two part solution. There is an Apache module and there is an SCGI server that runs the rails application. Using this combination, I was able to get Rails running under Apache on Windows.

Under Apache the setup is fairly simple:

LoadModule scgi_module modules/mod_scgi.so
SCGIMount / 127.0.0.1:9999

Rails SCGI is well documented information including configuring Apache. So, I won’t repeat everything.

Configure Apache for NTLM Authentication

To use mod_auth_sspi, you have to load the module and then configure your application to use the domain to authenticate users. Once that is in place, your users will be authenticating using Active Directory. If you use a browser like Firefox or Safari, the user will see a Login Prompt, but if you are using Internet Explorer it will automatically pass the user’s current Active Directory credentials to Apache and mod_auth_sspi will do the authentication transparently.


LoadModule sspi_auth_module modules/mod_auth_sspi.so


AuthType SSPI
SSPIAuth On
SSPIAuthoritative On
SSPIDomain DOMAIN
SSPIOfferBasic On
SSPIOmitDomain Off
Require valid-user

Tying it Together In Rails

At this point, anyone who gets access to your application has been authenticated. If your application does not need to know who the user is, congratulations! You’re done. But if your application needs to actually know they who is using the app and not just that it’s a valid user then you have a little bit more work to do.

The most common way to check that a user is authenticated in Rails is to use a before_filter in your controllers to check that the session is properly setup and to send the user to a Login controller if they are not authenticated. You’ll need do the same thing in this case:

class ApplicationController < ActionController::Base protected def authenticate unless session[:user] redirect_to :controller => "login"
return false
else
# TODO: Check that the current user is the same as the session user
# TODO: Check user against active directory every time?
# request.env["REMOTE_USER"]
end
end
end

Most LoginContollers will show the user a form that allows them to authenticate. A User object is then used to authenticate that the username and password are correct. This implementation will be slightly different though. When Authentication is done by Apache it will set an HTTP variable request.env[“REMOTE_USER”] that will be available in your Controllers to identify who the authenticated User is. You can use this information to learn about the user instead of showing them a login form. Remember, the user has already been authenticated by the Active Directory domain, so the user’s credentials have been checked.


class LoginController < ApplicationController def index success = false user = User.find_by_name(request.env["REMOTE_USER"]); if user session[:user] = user success = true end redirect_to request.env["HTTP_REFERER"] ? :back : '/' if success redirect_to :action => :error if ! success
end

def logout
session[:user] = nil
end

def error
end
end

In the login controller you can get the REMOTE_USER information and then do whatever you need to do with that information. Most likely you’ll want to call into Active Directory using something like ruby-ldap to check on things like groups or role membership and to get extra User information.

Conclusion

There are a lot more implementation pieces that need to be completed. I would like to be able to pull extra information about a user from Active Directory the first time they authenticate and store it in the session so I could know their full name for example.

Hopefully this information will help someone else out because it definitely is not an obvious thing to do (at least it wasn’t to me).

Making Session Data Available to Models in Ruby on Rails

Ruby on Rails is implemented as the Model View Controller (MVC) pattern. This pattern separates the context of the Web Application (in the Controller and the View) from the core Model of the application. The Model contains the Domain objects which encapsulate business logic, data retrieval, etc. The View displays information to the user and allows them to provide input to the application. The Controller handles the interactions between the View and the Model.

This separation is a very good design principle that generally helps prevent spaghetti code. Sometimes though the separation might break down.


The following is really an alternative to using the ActionController::Caching::Sweeper which is a hybrid Model/Controller scoped Observer really. It seems to me, based on the name, that the intent is much more specific than giving Observers access to session data. Which do you prefer?

Rails provides the concept of a Model Observer. This Observer allows you to write code that will respond to the lifecycle events of the Model objects. For example you could log information every time a specific kind of Model object is saved. For example you could record some information every time an Account changed using the following Observer:

class AccountObserver < ActiveRecord::Observer def after_update(record) Audit.audit_change(record.account_id, record.new_balance) end end

You might have noticed a limitation with the previous API though. You didn't notice? The only information passed to the Observer is the Object that is being changed. What if you want more context than this? For example, what if you want to audit not only the values that changed them, but the user who made the change?

class AccountObserver < ActiveRecord::Observer def after_update(record) Audit.audit_change(current_user, record.account_id, record.new_balance) end end

How do you get the current_user value? Well, you have to plan ahead a little bit. The User in this application is stored in the HTTP Session when the user is authenticated. The session isn't directly available to the Model level (including the Observers) so you have to figure out a way around this. One way to accomplish this is by using a named Thread local variable. Using Mongrel as a web server, each HTTP request is served by its own thread. That means that a variable stored as thread local will be available for the entire processing of a request.

The UserInfo module encapsulates reading and writing the User object from/to the Thread local. This module can then be mixed in with other objects for easy access.

module UserInfo
def current_user
Thread.current[:user]
end

def self.current_user=(user)
Thread.current[:user] = user
end
end

A before_filter set in the ApplicationController will be called before any action is called in any controller. You can take advantage of this to copy a value out of the HTTP session and set it in the Thread local:


class ApplicationController < ActionController::Base include UserInfo # Pick a unique cookie name to distinguish our session data from others' session :session_key => '_app_session_id'

before_filter :set_user

protected
def authenticate
unless session[:user]
redirect_to :controller => "login"
return false
end
end

# Sets the current user into a named Thread location so that it can be accessed
# by models and observers
def set_user
UserInfo.current_user = session[:user]
end
end

At any point in an Observer of a Model class that you need to have access to those values you can just mixin the helper module and then use its methods to access the data. In this final example we mixin the UserInfo module to our AccountObserver and it will now have access to the current_user method:

class AccountObserver < ActiveRecord::Observer include UserInfo def after_update(record) Audit.audit_change(current_user, record.account_id, record.new_balance) end end

You generally shouldn't need this kind of trick outside of an Observer. In most cases the Controller should pass all of the information needed by a Model object to it through its methods. That will allow the Model objects to interact and the Controller to do the orchestration needed. But in a few special cases, this trick might be handy.