Ruby REXML - Another Take on XML Parsing

January 25, 2006 - 3 minute read -

If you are a programmer and have not been living under a rock, you've heard of Ruby. There's been a lot of hype built up around the language recently. Its even more hyped progeny Ruby on Rails highlight the strengths of this dynamic language. A seemingly large number of thought leaders in the software development space have also taken to the language as a primary tool in their toolkit.

Curious about the hype, and looking for something interesting to learn, I've recently started playing around with the Ruby programming language and Rails. I hope to write some more in detail about it at some point, but thought I would share some thoughts on one of the available Ruby XML parsers.

REXML is a pure Ruby implementation of an XML parser. It was inspired by an Open Source Java XML parser called Electric XML. Basically the whole point of REXML is to make an XML parser that feels like a Ruby library. It is a reaction to the SAX and the DOM parsers that exist. SAX and DOM parsers are generally available in all kinds of languages, but they are really tightly coupled to the demands of XML.

As a bit of background: SAX is an event-based model that parses a document in a single pass triggering event callbacks when specific elements are reached. The programmer is then responsible for deciphering the context and calling the appropriate functions. DOM on the other hand is a heirarchical, in-memory representation of an XML document. It allows for ad-hoc traversal of the XML structure. DOM also allows for the use of things like XPath for querying elements (attributes too) of an XML structure.

REXML on the other hand aims to feel like Ruby.

Creating a document:

doc = REXML::Document.new(atom_feed)
root = doc.root()

Getting a single text node:

linkNode = root.elements['link']
linkValue = linkNode.text if linkNode

REXML also uses the standard Enumerable features found by other collections classes so that it feels very natural to use. In Ruby it is common to use "blocks" to handle each of the items in a method. Rather than having to itterate over a collection of items, you let the collection do the itterating for you.

For each of the elements called "author" call the add_author method:

root.elements.each('author') { |a| add_author(a) }

You can also use XPath:

XPath.each(doc, "//entry") { |e| add_entry(e) }

Of course you can create and/or modify documents as well in a Ruby-esque way.

entry = Element.new("entry")
id = Element.new("id")
id.add_element(Text.new("_some_unique_id"))
entry.add_element(id)
['Geoff', 'Bob', 'Sam'].each do |author|
    a = Element.new("author")
    name = Element.new("name")
    name.add_element(Text.new(author))
    entry.elements << a
end

It's nice to see a slightly different take on XML manipulation.