Archive for the ‘Biblios’ Category

‡biblios, Firefox 3.5 and Google Gears

Wednesday, July 15th, 2009 by Galen Charlton

Firefox 3.5 was released recently and includes a bunch of useful enhancements but a big fly in the ointment for applications like ‡biblios.net that use the Google Gears plugin. For some reason, Google did not manage to get a version of the add-on compatible with FF 3.5 ready when the new browser version was launched, so apps that relied on it broke.

Fortunately, Google has now released a compatible version of the plugin. Tonight I upgraded my installation of Firefox to 3.5, updated Google Gears, and tested ‡biblios, and I’m happy to report that it’s now working.

There were a couple quirks:

  1. When I upgraded Firefox, it didn’t detect that a new version of the add-on is available. I had to remove the old version of the add-on, then go to http://gears.google.com/ to install Gears.
  2. When I went there, it didn’t offer me the chance to install Gears. I eventually figured out that I could go directly to the terms of use form to force the download and installation.

‡biblios.net collaborative sharing and editing of records

Monday, December 8th, 2008 by Chris Catalfo

LibLime is developing a new open platform for sharing bibliographic records as part of its forthcoming ‡biblios.net project (now in beta at http://beta.biblios.net). As part of this platform, LibLime is making available bibliographic records via several protocols: z39.50 and SRU for starters, eventually via a REST protocol and OAI-PMH.

LibLime is also allowing write access to the records via the protocol ‡biblios uses to interact with Koha. This api (documented more fully at the biblios.org site) looks like the following:

authenticate: POST username=x password=y. Receives: cookie for session use
bib_profile: GET. Receives: xml document describing marc21 fields in use by backend Koha server.
bib/xxx: GET: Retrieves a marcxml document at xxx bib number.
bib/xxx: POST: Saves posted marcxml document to server.
new_bib: POST: Saves marcxml document (new to database) to server.

In the coming weeks we will announce publicly available urls for these access services, so sharpen your pencils and get ready to collaboratively share and edit some records!

Challenges in testing ‡biblios

Sunday, September 14th, 2008 by Chris Catalfo

Lately I’ve been looking into use Selenium and JsUnit to test Biblios. Selenium is designed to allow functional testing of web applications, while JsUnit is a unit test framework for Javascript applications.

There are some challenges to using each of these testing tools because of Biblios’ architecture and heavy use of AJAX. One way to use Selenium is termed Selenium Remote Control. This sets up a proxy server and starts a browser instance which connects to the proxy server; the proxy server forwards requests to the site being tested and reports back to the test script the results. One problem for Biblios in this approach is that the browser instance run by Selenium Remote Control must have the Google Gears extension installed in order to work. This takes a little work to set up, or a custom profile in the case of Firefox.

A challenge in using JsUnit is that JsUnit provides no way to wait until specified conditions occur before executing a test. This is necessary in Biblios because certain tests cannot be run until the application is fully loaded, which happens through the use of AJAX calls after the html/js/css of the index.html page is loaded. So JsUnit starts running its tests before Biblios is really ready for them. Luckily, Selenium does provide a means of waiting for specified conditions before executing tests.

Still another challenge in testing Biblios is that Biblios’ html is nearly entirely generated by the ExtJS javascript framework. This makes identifying elements to test a challenge because in many cases the html elements have no systematic id attribute to rely upon.

Deciding on an API for Biblios

Saturday, May 24th, 2008 by Chris Catalfo

As I continue to work on Biblios in anticipation of its release (soon, I hope!), it is about time to decide on an API.

I have already put into place a simple macro system for batch editing of bibliographic records. The language is Javascript and makes use of a MarcRecord javascript object to manipulate MARCXML records.

Here is a simple example (record is a MarcRecord instance):


// Check to see if record has 856.  If so, add subfield $u with url.  If not, add a new 856 with url.
if( record.hasField('856') ) {
    record.field('856').subfield('u', 'http://www.google.com');
}
else {
    record.addField( new Field('856', '', '', [ new Subfield('u', 'http://www.google.com')]) );
}

I would like to provide access to Biblios’ main functions for use by plugins. Here are a few ideas for API functions:

  • Run a search
  • Run the current search but limited to something
  • Save all search results to a folder
  • Save record with id n to a particular folder
  • Edit record with id n
  • Run a macro on all records in a folder

I’d be interested to hear what others think: what they’re used to in other cataloging software and what commands/tools that software might be missing which could be ultimately included in Biblios.

Making the MARC21 specification usable via XML

Monday, April 7th, 2008 by Chris Catalfo

As part of my work on the Biblios project, I need access to the MARC21 specification in a machine-readable form so that Biblios can provide context-sensitive help. To this end, I’ve been extracting field and subfield names, descriptions, and valid values from the MARC21 specification in HTML form, available at the Library of Congress website here. I thought other folks might be interested in this, hence this post.

I enjoy using Python and so I thought I’d try whip something up in it. I discovered the BeautifulSoup Python library for extracting data from HTML pages and it sounded perfect for this task.

A glance at the specification page for the MARC21 Leader field (here) shows that, happily, the pages are constructed fairly logically and can be extracted based on their css class.

I put together a few regular expressions like this one to parse out the key data:


# get the character position like this:
# 18-21 - Illustrations (006/01-04)
charposmatch = re.compile(r'^(?P<position>\d{1,2}(?P<extent>-\d{2})*)\s-\s(?P<name>.*)')

BeautifulSoup lets you write code like this to walk through an HTML document and extract parts:


 for charpos in soup.findAll('div', {'class':'characterposition'}):
        try:
            text = charpos.findNext('strong', recursive=False).contents[0].rstrip()

I wrote a little Python script to download the relevant specification files and ran my extraction scripts. Out came something like this:


<marc21spec>
   <tag code="000">
      <position description="Computer-generated, five-character number equal to the length of the entire
                              record, including itself and the record terminator. The number is right justified
                              and unused positions contain zeros." name="Record length" position="00-04"/>
      <position description="One-character alphabetic code that indicates the relationship of the record to a
                              file for file maintenance purposes." name="Record status" position="05">
         <value code="a" description="Increase in encoding level"/>
         <value code="c" description="Corrected or revised"/>
         <value code="d" description="Deleted"/>
         <value code="n" description="New"/>
         <value code="p" description="Increase in encoding level from prepublication"/>
      </position>

The MARC21 specification in all it’s glory, as xml!

I have no doubt that there are still some incorrect or missing parts in the generated xml files. If you’re interested in double checking the files, or even better, improving the extraction scripts, you can download them here:
marc21controlfields.xml
marc21varfields.xml
extractControlTags.py
extractVariableTags.py

To run the extraction scripts, pass them the path containing the .html files (or a single file) and a filename to output the xml content to:

python extractControlTags.py . marc21controlfields.xml
python extractVariableTags.py . marc21variablefields.xml

Biblios at Code4LibCon 2008

Friday, March 14th, 2008 by Chris Catalfo

I attended my first Code4Lib Conference a few weeks ago and did a presentation on “Biblios”, the web-based cataloging software I’ve been working on here at LibLime. I will be posting slides of the presentation in the next few days.

I am very sorry to have missed the presentation at Code4LibCon 2008 on a MODS editor written in XFORMS (link to slides available here). This looks like a very promising approach for editing XML documents. XFORMS is an attractive technology I plan on looking into.

A web site for Biblios is in the works and should go live next week, with links to downloads, a demo, and documentation. As soon as it’s ready I will post links on this blog.