patkua@work

Smooth Scala XML APIs

As an alternative to using JAXB for reading input, we thought we’d try simply wrapping an XML document in a class structure. XML is a first class citizen in scala, so we thought we’d have a look at what it would be. Note that we don’t expect very large XML documents (they all easily fit into memory) but the external structure is a little bit ugly.

We inject the XML document into the class via the constructor and then using scala X-Path equivalents, we pull out the interesting fields. I have to admit it’s better than dealing with XML xpath in the java world, and the code is pretty simple.

I’ve written an equivalent to give you an idea of what we evolved from.

Given we have an XML document that looks like this (domain made up here)

<profile>
  <names>
     <defaultName>Harry Potter</defaultName>
     <alternativeNames>
        <name>Harry</name>
        <name>Mr Potter</name>
     </alternativeNames>
  </names>
  <contact>
     <email>harry.potter@hogwarts.com</email>
     <website>http://hogwarts.com</website>
     <phone>http://hogwarts.com</phone>     
  </contact>
</profile>

We ended up with a class that looks like this

import xml.Node
class Profile (xml: Node) {
  def name = {
    (xml \ "names" \ "defaultName").text
  }

  def email = {
    (xml \ "contact" \ "email").text
  }

  def website = {
    (xml \ "contact" \ "website").text
  }

  def phone = {
    (xml \ "contact" \ "phone").text
  }
}

We can apply a couple of small refactorings here. Simple one line expressions do not need curly braces. The class now becomes:

import xml.Node
class Profile (xml: Node) {
  def name =  (xml \ "names" \ "defaultName").text
  def email = (xml \ "contact" \ "email").text
  def website = (xml \ "contact" \ "website").text
  def phone = (xml \ "contact" \ "phone").text
}

Maybe we also wanted to remove a little bit of duplication in the xpath. Though this increases the size of the class a bit:

import xml.Node
class Profile (xml: Node) {
  def name =  (xml \ "names" \ "defaultName").text
  def email = (contact \ "email").text
  def website = (contact \ "website").text
  def phone = (contact \ "phone").text

  private def contact = (xml \ "contact")
}

Since our XML structure will now change, we also had some feedback that we could use the lazy val option to cache the results. The code now looks like this:

import xml.Node
class Profile (xml: Node) {
  lazy val name =  (xml \ "names" \ "defaultName").text
  lazy val email = (contact \ "email").text
  lazy val website = (contact \ "website").text
  lazy val phone = (contact \ "phone").text

  lazy val contact = (xml \ "contact")
}

I’m not so sure if the last two refactorings give much more than the second refactoring, but it certainly helped me learn a a bit more scala. I think we had some pretty good wins by dealing with XML the scala way. Our tests are very readable because you don’t have weird string concatenation or sucking in XML from a file to test it, and I think the code is quite readable. I would have preferred closer XPath-y syntax, but I guess learning the scala XML syntax for traversing XML isn’t too bad. The other good thing about this is that you don’t have to worry about null or non existant nodes – you simply get back an empty string. Pretty decent default behaviour for at least our use case.

Exit mobile version