Archive for the ‘XPath’ Category

Encoding and decoding XML data as path sequences

Friday, July 4th, 2008 by Chris Catalfo

Lately I’ve been thinking about how to represent information about XML paths and data as a string.

For example, I’d like to be able to record the origin of this data:


<titleInfo type="alternative">
<title>Special edition using XSLT</title>
</titleInfo>

as something like this (with id and data as properties in a JSON object):


{"id":"titleInfo-2@type=alternative\title-1","data":"Special+edition+using+XSLT"}

I could then take the preceding id string, extract the provenance of the data, and recreate the original XML document.

Here’s how I’ve tried encoding the XML path and data using an XSLT stylesheet:

For each text element, create an id consisting of:

  1. Each ancestor (except the root)
  2. A dash to delimit the ancestor element’s name from its position
  3. The integer position of that node in the XML file (using )
  4. Each of the ancestor’s attributes, in the form @attrname=attrvalue
  5. A backslash to be used a path delimiter
  6. The text element’s name

With this id, I believe I now have everything I need to reconstruct the node that the data referenced by that id came from.

After playing around with this a bit, I realized that what I’d done was basically reinvent XPath! In XPath, the preceding path in the id string would be represented as:

/titleInfo[1]@type=alternative/title[0]

OK…so next idea is to see if there are libraries out in the wild wild web for creating XML documents from XPath expressions (and not just querying XML documents). I see that the Perl module XML::XPath may offer a solution.

I also wonder if this is how XForms libraries keep track of what parts of an XML document have been edited….