Frankenstein, or the modern FRBR

May 11th, 2008 by Galen Charlton

I’ve been reading a “Norton critical edition” of Mary Wollstonecraft Shelley’s Frankenstein. The book includes the 1818 edition of the tale (the more familiar third edition, significantly revised, was published in 1831), eighteen contemporaneous and modern reviews and critical essays, a bibliography referring to an additional forty books and journal articles, and a few miscellaneous letters and poems thrown in for context.

Besides pointing to some deficiencies of my education (who knew that Igor was entirely a creature of the movies? Why was a comparatively short novel, only 155 pages in the edition I’m reading, published in three volumes in 1818?), reading the real McCoy has inspired a couple small musings.

Frankenstein starts with an epigraph from Milton’s Paradise Lost, but that was hardly the only literary influence on Shelley. Both of her parents were well-known authors, her husband was Percy Bysshe Shelley, and she self-consciously engaged in a program of reading to the point where her journal largely consists of a reading list. Among other things, Frankenstein is a response to Milton, various Gothic works by Hazlitt Ann Radcliffe, and various poems by Percy and Byron written around the time of the famous compact to write ghost stories that inspired Shelley to write.

From a purely mechanical point of view, the 336-page volume in hand, besides containing the text and a finite number of critical essays that could be catalogued and related to each other, must directly or indirectly refer to many dozens of books that would have been known to Shelley and hundreds of works of criticism that came later, to say nothing of the movies and plays that reinterpret the Frankenstein story and the thousands of works that simply evoke the image of Frankenstein’s monster or Shelley’s response to the Faust story. If you’ve gotten tired going through that last sentence, consider the plight of the poor cataloger who takes an expansive view of creating metadata describing Frankenstein work and this particular version that bundles in a number of essays. Relative to the possibilities, the 504 in LC’s MARC record is wanting:

“Includes bibliographical references (p. 335-336).”

Nor can you get a list of the titles and authors of the critical essays from the bib record. I’m not criticizing LC or the cataloger, cataloging rules and economic realities being what they are, but there’s an opportunity that I hope the cataloging and metadata community can work towards — not just focusing on the item in hand, but placing each work in the rich web of relationships of reference, homage, response, parody, and criticism. Barring a trek to strange and weird places to assemble a generous library board that can subsidize a week of effort to catalogue each complicated work, some notions:

  • A metadata record can never be done — even a completely analyzed MARC bib record does not sufficiently relate a rich work to its influencers and influencees.
  • Of course, at any given point in time it does have to be good enough to satisfy the users and those paying the bills.
  • Since no one cataloger can even begin to note all connections of one rich work to another, metadata culture must promote the easy enhancement of bibliographic records (or RDF triple-clouds, or whatever) by anybody qualified (and probably, anybody half-way qualified).
  • Bibliographic metadata must be linkable to other sources of metadata.

A final musing — the report of the LC Working Group on the Future of Bibliographic Control mentions the idea of speeding cataloging record production by getting basic metadata from the publishers. I have my reservations about whether publishers will be interested in fully cooperating with such a scheme, but suppose they do — would it be too much to ask to have them provide bibliographies in some kind of machine-readable format? I think this, all by itself, would be a big win for humanities researchers.

Closing in on Koha 3.0

May 4th, 2008 by Andrew Moore

Now that we’ve had a beta version of Koha 3.0 out for a little while now, there is some increased interest in getting a final version of Koha 3.0 put together soon. Paul recently started a discussion on the koha developers list about what we need to do to get a release out the door. This includes deciding on the last minute features we would like to include to make it a cohesive, useful product, what bugs absolutely need to be fixed, and the logistics involved in maintaining that version while we set our sights on the next version of Koha. I think that in the coming days and weeks we will see this discussion continue and a flurry of activity as we try to put some effort into finding the balance between completeness and timeliness

How Good is Google Book Services? Ask your mother.

April 29th, 2008 by atz

Despite not being even remotely Irish, my mother likes to make a traditional corned-beef and cabbage dinner for our family on St. Patrick’s Day, and this year was no exception. (Sorry, no pix.) My mother is a five-foot tall head reference librarian in a local public library system and she commands the type of incredible memory that you would expect from her profession. She makes use of this in another tradition, the singing of a particular St. Patrick’s Day hymn taught to her by nuns in grade school. Suffice to say, my mother has not been in grade school for a while, and for a song sung only once a year, it seems remarkable to me that anybody can remember all the words without any help. I mean, how far can you get into Auld Lang Syne or Good King Winceslas? If you say “all the way, no problem”, please remind yourself if you are currently, have even been, or are about to become a reference librarian.

So this song is essentially about St. Patrick and the persevering quality of Ireland’s faith in him, but pace is pretty quick and the lyrics are twisty and complicated. I’ve never heard this song anywhere else, and I didn’t know the title, but I wanted to find the lyrics. Since Google Book Services had just come out, I decided this would by my test. Searching on “St. Patrick” was never going to do it. The best I could do is remember the small, odd phrase “is bright with us yet.” In fact it was that phrase that made me want to revisit the lyrics in the first place.

GBS’s first hit was The Hymn-book of the Modern Church: Studies of Hymns and Hymn-writers (1905) by Arthur Edwin Gregory, a compendium of various hymns including two of the “Romanist” verses I was looking for, and a discussion of the author and his relative merits. Interestingly, it does not offer the title of the song, but Google’s results took me directly to the correct page so that immediately I was looking at the relevant content. Very impressive.

The second hit was even better: The Parochial Hymn Book (1897). 3 verses and a complete four-part vocal arrangement! The title, as it turns out, is “St. Patrick’s Day”, not the most helpful string to search against.

The specificity of the texts themselves is most remarkable. When the first rounds of “electronic texts” were circulated, many people were unimpressed with the experience of reading screens of flat ascii text, objecting to the sterile quality as not-bookish enough. The difference between that time and today is stark. Google’s text is not an abstract vision of the PHB’s content, rather it is photographic images from a particular book in a particular stack, with all the peculiarities of the physical original (save, perhaps, smell).

It has provenance: a hand-inked calligraphic block claims it for Andover-Harvard Theological Library of the Harvard Divinity School, with both a stamp and a bookplate noting that it comes from the estate of one Rev. Charles Hutchins on May 24th, 1939. You can even see Harvard’s call number penciled in on the verso page. The effect of these details is to greatly reinforce the validity of the text. Contrast this with a posting on any of a thousand interchangeable lyrics sites. Which would you regard as desireable or authoritative?

My experience was very much like those I suspect any fan of libraries has enjoyed, the feeling of discovering a tangible artifact of another time and place that was produced and preserved specifically so that you might encounter it, and have the information sought. Even 111 years later.

I chatted up my colleague Chris in New Zealand, one of the original Koha developers, about my GBS results. He was fairly impressed. For his test, he put in his father’s name, Ian Cormack, and was promptly returned as the first hit a link to his academic article “Creating an Effective Learning Environment for Maori Students” in Mai i Rangiatea: Maori Wellbeing and Development (1997). So there you have it: two texts separated by 100 years of time, and half the Earth in distance, accurately and immediately retrieved from one repository, for free.

Note: Google has built on to Book Services a bunch of other features, including ratings, reviews, tags and a My Library feature. In my My Library, you can see the three texts I mentioned. There is also a “Find this book in a library” link to OCLC’s WorldCat that tells me the closest (known) copy is 160 miles away.

For comparison, the my search at amazon was empty. Which suggests another question….

Paranoia Alert!! Assume your library had a copy of the same song in one hymnbook or another. Without GBS, based on my limited query data, how long would it take me to retrieve it at your library? I should add that my search began at 10:30PM.

Perhaps most striking is that GBS would be preferable even if I was sitting in Harvard’s library where the original still resides!

From libraries to Skynet

April 28th, 2008 by Galen Charlton

Who added AI to Koha?

Hint: Git tries its best to properly assign credit to patches, but it doesn’t always get it right.

Library jargon, or translating Koha from English to English

April 13th, 2008 by Galen Charlton

As Andrew S. Tanenbaum said, “the nice thing about standards is that there are so many of them to choose from.” Good old non-standardized library jargon provides an even richer field of variation. Do libraries serve members, patrons, clients, or customers? Is a patron placing a hold request or a reservation? When the item arrives and the patron checks it out, do we call the transaction a loan, a checkout, or an issue? Can the library issue an issue to patron? How many synonyms have I missed so far?

Koha’s base HTML templates use “English”; translations to other languages are generated by extracting strings from the templates and giving them to the translators. The files containing translated strings are then used to create a set of HTML templates in the desired language.

I put “English” in scare quotes because while nominally the language of coding is (I think) the New Zealand variant of the Queen’s English, in practice it is a mixture of NZ English, UK English, US English, and so forth. That already opens the door to potentially desirable localizations — after all, one really ought to put one’s “colour” and “flavor” in the right sociogeolinguistic buckets.

Which brings us back to library jargon — a “reservation” in one country is another’s “hold request”. An academic library’s “recall request” is a public library’s “you’ve gotta be kidding!”. A bright idea! Let’s convene an international committee to standardize English-language library jargon! I’m holding my breath with anticipation …

Still holding — but why not expand the scope of the committee and handle French library jargon?

*thunk*

later

OK, so that didn’t work. For now, it looks like a better solution is to embrace the differences and set up en-NZ, en-US, en-GB, etc. as defined translations for Koha, per some recent traffic on the koha-devel list. Localization ultimately doesn’t apply to just language and country; think of en-US-academic_library, en-US-small_public, etc.

Making the MARC21 specification usable via XML

April 7th, 2008 by Chris Catalfo

As part of my work on the Biblios project, I need access to the MARC21 specification in a machine-readable form so that Biblios can provide context-sensitive help. To this end, I’ve been extracting field and subfield names, descriptions, and valid values from the MARC21 specification in HTML form, available at the Library of Congress website here. I thought other folks might be interested in this, hence this post.

I enjoy using Python and so I thought I’d try whip something up in it. I discovered the BeautifulSoup Python library for extracting data from HTML pages and it sounded perfect for this task.

A glance at the specification page for the MARC21 Leader field (here) shows that, happily, the pages are constructed fairly logically and can be extracted based on their css class.

I put together a few regular expressions like this one to parse out the key data:


# get the character position like this:
# 18-21 - Illustrations (006/01-04)
charposmatch = re.compile(r'^(?P<position>\d{1,2}(?P<extent>-\d{2})*)\s-\s(?P<name>.*)')

BeautifulSoup lets you write code like this to walk through an HTML document and extract parts:


 for charpos in soup.findAll('div', {'class':'characterposition'}):
        try:
            text = charpos.findNext('strong', recursive=False).contents[0].rstrip()

I wrote a little Python script to download the relevant specification files and ran my extraction scripts. Out came something like this:


<marc21spec>
   <tag code="000">
      <position description="Computer-generated, five-character number equal to the length of the entire
                              record, including itself and the record terminator. The number is right justified
                              and unused positions contain zeros." name="Record length" position="00-04"/>
      <position description="One-character alphabetic code that indicates the relationship of the record to a
                              file for file maintenance purposes." name="Record status" position="05">
         <value code="a" description="Increase in encoding level"/>
         <value code="c" description="Corrected or revised"/>
         <value code="d" description="Deleted"/>
         <value code="n" description="New"/>
         <value code="p" description="Increase in encoding level from prepublication"/>
      </position>

The MARC21 specification in all it’s glory, as xml!

I have no doubt that there are still some incorrect or missing parts in the generated xml files. If you’re interested in double checking the files, or even better, improving the extraction scripts, you can download them here:
marc21controlfields.xml
marc21varfields.xml
extractControlTags.py
extractVariableTags.py

To run the extraction scripts, pass them the path containing the .html files (or a single file) and a filename to output the xml content to:

python extractControlTags.py . marc21controlfields.xml
python extractVariableTags.py . marc21variablefields.xml

Koha, the bacon donut ILS.

March 31st, 2008 by atz

 Last month in Portland, Oregon at the code4lib 2008 convention (a most impressive assemblage of library geekiness), a few of us broke out to the 24-hour bakery Voodoo Doughnut for this:

Bacon Donut by Voodoo

A donut, with bacon on it. A bacon donut.

I’d expected it to be strange, but the remarkable thing about the bacon donut is how unsurprising the taste is. The sweet maple and salty flavors are, as it turns out, very compatible. So it strikes me that the work I’ve been doing on Koha recently is a lot like the bacon donut: take two things people already like, we do the voodoo and make them work together in a new way.

For the OPAC, the place where this comes up most often is external content, like book cover images. Koha libraries have been using jacket images from Amazon for some time in production, internationally. It’s free and it’s broadly populated: a great feature, especially for small libraries who don’t have the advantage of a lot of subscription content services. Using their API, we can also pull and display content like user reviews, really fleshing out OPAC content.

I recently completed some commissioned Koha code for integrating Baker & Taylor images and content as an alternative to Amazon. Koha can now link to B&T ContentCafe excerpts, ratings, etc. and to their MyLibrary BookStore retail site. For design, my code followed the Amazon model, and certainly something similar could be crafted for other proprietary sources like Blackwell, Syndetics, etc. But upon reflection, I think that the entire model is already on it’s way out!

Enter Google Book Services. I’ll have more to say about GBS later, but suffice to say we now have a second, very widely available source of free book jacket images. (In fact, it may be enough to deflect calls some have been making for the Library of Congress to provide access to cover images like they do for other metadata.) The Google API is essentially javascript based and remarkably easy to integrate. How easy? Code4lib members were posting working example code back and forth within hours, and then within a day or two, other Koha users adapted their own servers to start using Google’s images. This is a great example of how OSS enables agility and adaptability.

So pretty soon we should expect that every current OPAC will have some images from somewhere, and that won’t be a distinguishing feature anymore. The next model to evolve will be to allow ajaxy failover from a ranked menu of many possible image sources (both free and subscription/keyed like B&T/syndetics). In fact, several coders have reported implementing this for their favorite sources already! I’m looking forward to seeing this code synthesized, providing the broadest possible coverage for images. Then we can start to get some abstraction around the other data in common, like reviews, ratings, etc.

Some of my colleagues have already started on LibraryThing and xISBN. If you have other external data sources you would like to see integrated in Koha, feel free to mention them here!

Koha 3 (Beta) Released

March 23rd, 2008 by Joshua Ferraro

I’m happy to announce that a packaged beta release of Koha 3 is now available. You can download from the usual location:

http://download.koha.org/koha-3.00.00-beta.tar.gz
http://download.koha.org/koha-3.00.00-beta.tar.gz.sig

You can check the integrity of the package; either by verifying the provided GPG signature (.sig) or by comparing the MD5 checksum:

84f6ec3615155cfa755a9e7139bd07df koha-3.00.00-beta.tar.gz

I’ve also tagged this in Git as “version 3.00.00 beta” v3.00.00-beta

This is the second packaged release of Koha 3. Prior to the official stable release of Koha 3.0, software issues, bugs, and unimplemented features must be addressed. These are documented on Koha’s Bugzilla:

http://bugs.koha.org

and organized on the 3.0 RM’s QA notes Wiki page:

http://wiki.koha.org/doku.php?id=en:development:qanotes3.0

The release notes for this beta version are pasted in an email to the koha-devel and main koha user lists, and will also on the koha.org website sometime over this weekend.

Index Data adopts git

March 20th, 2008 by Galen Charlton

The folks at Index Data have switched from CVS to git for many of their open source products, including YAZ, Zebra, and PazPar2. Visit their gitweb or clone from their repository (git://git.indexdata.com/project) for some distributed version control goodness. They use submodules, so git version 1.5.3 or higher is required.

Just Browsing

March 18th, 2008 by Joshua Ferraro

Recently, on the Koha list, one of the users asked about the possibility of adding a ‘browse’ feature to the detail of a given record. The idea is, you might want to see what books appear on the shelf before and after that item, in a given location and shelf. As it turns out, it was a fairly trivial exercise — I spent Sunday afternoon whipping up a basic browser degradable shelf browser, and Owen Leonard, Koha’s Interface Designer, made it look pretty :-).

Why degradable? Glad you asked. One of the goals of the Koha project from the beginning is that all of the interfaces are fully degradable and will work in any browser. So whenever we code a new feature, we write it for that environment first, then we slap on any additional functionality to make it prettier or more Ajaxy, etc.

Anyway … Here’s a basic screenshot of the display:

Shelf Browser