Ruby REXML – Another Take on XML Parsing

If you are a programmer and have not been living under a rock, you’ve heard of Ruby. There’s been a lot of hype built up around the language recently. Its even more hyped progeny Ruby on Rails highlight the strengths of this dynamic language. A seemingly large number of thought leaders in the software development space have also taken to the language as a primary tool in their toolkit.

Curious about the hype, and looking for something interesting to learn, I’ve recently started playing around with the Ruby programming language and Rails. I hope to write some more in detail about it at some point, but thought I would share some thoughts on one of the available Ruby XML parsers.

REXML is a pure Ruby implementation of an XML parser. It was inspired by an Open Source Java XML parser called Electric XML. Basically the whole point of REXML is to make an XML parser that feels like a Ruby library. It is a reaction to the SAX and the DOM parsers that exist. SAX and DOM parsers are generally available in all kinds of languages, but they are really tightly coupled to the demands of XML.

As a bit of background: SAX is an event-based model that parses a document in a single pass triggering event callbacks when specific elements are reached. The programmer is then responsible for deciphering the context and calling the appropriate functions. DOM on the other hand is a heirarchical, in-memory representation of an XML document. It allows for ad-hoc traversal of the XML structure. DOM also allows for the use of things like XPath for querying elements (attributes too) of an XML structure.

REXML on the other hand aims to feel like Ruby.

Creating a document:

doc = REXML::Document.new(atom_feed)
root = doc.root()

Getting a single text node:

linkNode = root.elements['link']
linkValue = linkNode.text if linkNode

REXML also uses the standard Enumerable features found by other collections classes so that it feels very natural to use. In Ruby it is common to use “blocks” to handle each of the items in a method. Rather than having to itterate over a collection of items, you let the collection do the itterating for you.

For each of the elements called “author” call the add_author method:

root.elements.each('author') { |a| add_author(a) }

You can also use XPath:

XPath.each(doc, "//entry") { |e| add_entry(e) }

Of course you can create and/or modify documents as well in a Ruby-esque way.

entry = Element.new("entry")
id = Element.new("id")
id.add_element(Text.new("_some_unique_id"))
entry.add_element(id)
['Geoff', 'Bob', 'Sam'].each do |author|
    a = Element.new("author")
    name = Element.new("name")
    name.add_element(Text.new(author))
    entry.elements << a
end

It’s nice to see a slightly different take on XML manipulation.

About Geoff Lane

I’m Geoff Lane and I write Zorched.net as I figure things out about software development in the hopes that it can help other people facing similar situations. Also as a thanks to the larger web community for all of the information and knowledge that they have shared. I’ve been a professional software developer since 1999 working with a variety of different technologies. I’ve worked for startups in the Silicon Valley and Chicago, IL and now work as a consultant building custom applications for clients.
This entry was posted in Code, Ruby. Bookmark the permalink.

One Response to Ruby REXML – Another Take on XML Parsing

  1. Pingback: Blog bookmarks 05/16/2008 « My Diigo bookmarks

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code lang="" line="" escaped="" highlight=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>