ALA 2008 conference notes: ALCTS FRBR Interest Group

June 27th, 2008 by Galen Charlton

This morning I went to the meeting of the FRBR Interest Group at the American Library Association (ALA) conference in Anaheim, California. For those who like lots of acronyms, the interest group is a part of the Association for Library Collections and Technical Services (ALCTS), a division of ALA.

There were two invited speakers. The first, John Espley from VTLS, discussed a couple projects that VTLS has started to promote FRBR and VTLS’s implementation of it in their Virtua ILS. The first, “Try FRBR, You’ll Like It!”, offers existing VTLS customers the chance to send a small sample of MARC bib records and see how they look after FRBRization.

The second project is of more interest to non-VTLS customers. VTLS has started an experiment to offer Virtua’s FRBRization tools in the form of software as a service (SaaS). A library would send VTLS an extract of all of their bib records. VTLS would then determine which subset of the records would most benefit from FRBRization, then create a Virtua database with the FRBRized set of bibs. The library could then set up their OPAC to link from bib records to the work-sets stored in the Virtua database. That would allow a patron to find a bib for a paperback edition of Tom Sawyer and click on a link to see a list of all editions of that work that the library has. From the work-set page, the patron could in turn travel to one of the individual bibs.

VTLS has a prototype of this service working for one of their Virtua customers, but they intend for the service to work with any ILS. The prototype seems to be quite new — Espley mentioned that it was set up in the past week or so — and during the question and answer session, Espley and the audience identified a number of issues for VTLS to work through. One issue is improving their automated FRBRization tools, as Espley said that some manual cleanup was needed to group together expressions in the prototype and create higher-level entities that VTLS calls “superworks” (under FRBR rules, the book and movie versions of Tom Sawyer are two separate works — a “superwork” puts the two works under a single Tom Sawyer concept). Another is keeping the FRBRized database up to date as the library adds and updates their bib records.

VTLS’s FRBRization service is an interesting idea, and it could complement services such as xISBN and ThingISBN by offering a FRBRization that is customized to a library’s specific collection. I applaud VTLS for undertaking the experiment. Of course, I have concerns about the openness of such a service, and encourage VTLS to think about keeping the service as open as their business model permits, including:

  • Making sure that any web service APIs related to the FRBR service are fully documented so that their customers (and others!) can easily build mashups.
  • Making sure that there are no restrictive licensing terms that would prevent a library from contributing changes they make to improve the FRBRization back to the library community.
  • Publishing details of the VTLS FRBRization algorithm, in particular, to describe how and why it may differ from the OCLC FRBR work-set algorithm.
  • Contributing any bib record enhancement that VTLS may do as part of the service (e.g., by adding uniform title headings) to the library community.

The second speaker was Jennifer Bowen from the eXtensible Catalog project (XC). The XC project aims to create open source tools and services to help libraries improve resource discovery and metadata management.

Part of the planned XC system is a “metadata hub” that would harvest records from a library’s ILS using OAI-PMH. Once in the hub, the MARC records would be mapped to a more flexible schema. Since RDA has not been finalized, XC is devising an interim schema that includes the Dublin Core elements (mapped to FRBR entities) and about 20 elements from RDA. As such, the XC schema will be a testbed for parts of RDA — as Bowen said, an “RDA sandbox”.

How does FRBR fit in? Besides the FRBR entities represented in the XC schema, incoming records will be split into their FRBR components. The proposed schema doesn’t seem to be available on the XC website; I’ll be very interested to see it when it’s published.

There was a brief general discussion after the two speakers finished. Of particular note: somebody asked how she, as a cataloger in a small public library that is not a member of OCLC, can prepare her catalog for FRBRization. This spawned an interesting discussion. One person made the point that catalogers should consider adopting a peer-to-peer model for distributing metadata instead of relying on central repositories to collect all improvements to metadata records. In the case of FRBR, this is important because one way to make a MARC21 bib record more useful for FRBRization is to add a uniform title heading. For such an improvement to be even more useful, it should be contributed the library community, but as someone said at the meeting, “While we are very good about sharing the first version of a bib record, we’re less good about sharing enhancements.”

To close with a bit of shameless self-promotion, I discuss using distributed version control systems as a model for sharing library metadata (and perhaps more importantly, changes to library metadata), in my article in the current issue of the Code4Lib Journal. While big central repositories of metadata such as OCLC and the Open Library are very important, I think a distributed record of record sharing is also needed.

Technorati Tags: , ,

Koha 3.0 RC1 Released

June 27th, 2008 by Nicole C. Engard

It’s finally here! After tons of hard work, the Koha community has announced the release of Koha 3.0 RC1. This from the many Koha mailing lists:

You can download from the usual location:

http://download.koha.org/koha-3.00.00-stableRC1.tar.gz
http://download.koha.org/koha-3.00.00-stableRC1.tar.gz.sig

You can check the integrity of the package; either by verifying the provided GPG signature (.sig) or by comparing the MD5 checksum:

5cc0914c5e8250c2491f4dbcf27d4301 koha-3.00.00-stableRC1.tar.gz

I’ve also tagged this in Git as “version 3.00.00 stableRC1″ v3.00.00-stableRC1

This is the third packaged release of Koha 3. Prior to the official stable release of Koha 3.0, translations will be updated; additional issues and bugs may be be addressed. A list of these are documented on Koha’s Bugzilla:

http://bugs.koha.org

and organized on the 3.0 RM’s QA notes Wiki page:

http://wiki.koha.org/doku.php?id=en:development:qanotes3.0

The release notes for this RC1 version are pasted in below, and will also on the koha.org website sometime soon.

Cheers,


Joshua Ferraro
Koha 3.0 Release Manager

And as many of you know (well at least those of you on Twitter & Facebook) I have been working on the documentation for this new release and my working draft can be viewed online (hopefully to be moved to a more collaborative medium soon) via LibLime’s Google Sites at http://sites.google.com/a/liblime.com/koha-manual/Home. Feel free to notify me of any changes, suggestions, etc.

Technorati Tags:

Gitting Used to Git

June 6th, 2008 by Andrew Moore

I have been using git a lot more efficiently recently, and I want to share some of the more advanced things that may help you get used to using git, too.

First, it helps me a lot to have some things in color. I have found these four config changes to make it a lot easier to scan git output quickly. The “diff” one is especially handy.

  • git config –global color.branch auto
  • git config –global color.status auto
  • git config –global color.diff auto
  • git config –global color.interactive auto

Second, I have found “git add –interactive” to be pretty useful. If you have changed several files and only want to commit some of them, this will present a menu-driven interface to let you pick the files to add. Even better, if you have edited a file in two places and only want to include one “chunk” in your commit, this lets you specify that. It’s great if you have added some debug code at the top or bottom that you don’t want to commit.

Next, I’ve been using git rebase –interactive” to be able to re-order and combine my patches to make them more readable. If you have a long sting of small commits that you want to organize better, you can run “git rebase –interactive HEAD~20″. This will open an editor with the last 20 commits in it. You can reorder the lines to reorder the commits. You can also “squash” the lines to merge commits together. This will help you make more readable sets of commits.

Finally, if you have a commit that you want to split up, use “git rebase –interactive” to “edit” it. Then, “git reset HEAD^” to put yourself “back in time” to that spot. Then, you an choose only a subset of the files or patches to commit, commit them, and then optionally commit the rest.

For more help on using git, I have really found the gitcasts to be a tremendous help.

Some of these features require a newish version of git, so if yours doesn’t seem to be working like this, I recommend an upgrade.

git ‘er done!

Koha 3.0 Haiku

June 5th, 2008 by Joshua Ferraro

To do my part for
LibLime's bloging policy
here are two haiku

Koha ILS,
we're nearing the 3rd release
watch koha dot org

Code4Lib 2008 Videos

June 4th, 2008 by Nicole C. Engard

If you weren’t able to make it to the conference, you can still see all the great talks!! Check out the videos from the conference at Archive.org.

Also all slides and videos are linked from the conference schedule.

Technorati Tags: ,

Deciding on an API for Biblios

May 24th, 2008 by Chris Catalfo

As I continue to work on Biblios in anticipation of its release (soon, I hope!), it is about time to decide on an API.

I have already put into place a simple macro system for batch editing of bibliographic records. The language is Javascript and makes use of a MarcRecord javascript object to manipulate MARCXML records.

Here is a simple example (record is a MarcRecord instance):


// Check to see if record has 856.  If so, add subfield $u with url.  If not, add a new 856 with url.
if( record.hasField('856') ) {
    record.field('856').subfield('u', 'http://www.google.com');
}
else {
    record.addField( new Field('856', '', '', [ new Subfield('u', 'http://www.google.com')]) );
}

I would like to provide access to Biblios’ main functions for use by plugins. Here are a few ideas for API functions:

  • Run a search
  • Run the current search but limited to something
  • Save all search results to a folder
  • Save record with id n to a particular folder
  • Edit record with id n
  • Run a macro on all records in a folder

I’d be interested to hear what others think: what they’re used to in other cataloging software and what commands/tools that software might be missing which could be ultimately included in Biblios.

Going Up?

May 22nd, 2008 by atz

Last week I had the chance to get a hardhat tour of the massive renovation project underway at my alma mater’s epicenter, Ohio State University’s (Main) Thompson Library. Lest you get the impression from my Google Books post that physical libraries are passé, the Buckeyes here provide a striking counter-example.

East Atrium Skylight

In classic OSU style, the scale of the project is huge, with a cost of over $108 Million and 140 full-time construction staff.

This leads to many questions. What does a hundred million dollar book house look like?  Well, you can see for yourself on this University webcam, right now during daylight hours it looks like a Bob the Builder episode.  Click through the image above or here for the tour photos.

Also, what are the defining requirements of systems that are suitable for use in such a large environment?  How well do their current implementations fulfill them, and how well does Koha compare?

[update] Added links. [/update]

Frankenstein, or the modern FRBR

May 11th, 2008 by Galen Charlton

I’ve been reading a “Norton critical edition” of Mary Wollstonecraft Shelley’s Frankenstein. The book includes the 1818 edition of the tale (the more familiar third edition, significantly revised, was published in 1831), eighteen contemporaneous and modern reviews and critical essays, a bibliography referring to an additional forty books and journal articles, and a few miscellaneous letters and poems thrown in for context.

Besides pointing to some deficiencies of my education (who knew that Igor was entirely a creature of the movies? Why was a comparatively short novel, only 155 pages in the edition I’m reading, published in three volumes in 1818?), reading the real McCoy has inspired a couple small musings.

Frankenstein starts with an epigraph from Milton’s Paradise Lost, but that was hardly the only literary influence on Shelley. Both of her parents were well-known authors, her husband was Percy Bysshe Shelley, and she self-consciously engaged in a program of reading to the point where her journal largely consists of a reading list. Among other things, Frankenstein is a response to Milton, various Gothic works by Hazlitt Ann Radcliffe, and various poems by Percy and Byron written around the time of the famous compact to write ghost stories that inspired Shelley to write.

From a purely mechanical point of view, the 336-page volume in hand, besides containing the text and a finite number of critical essays that could be catalogued and related to each other, must directly or indirectly refer to many dozens of books that would have been known to Shelley and hundreds of works of criticism that came later, to say nothing of the movies and plays that reinterpret the Frankenstein story and the thousands of works that simply evoke the image of Frankenstein’s monster or Shelley’s response to the Faust story. If you’ve gotten tired going through that last sentence, consider the plight of the poor cataloger who takes an expansive view of creating metadata describing Frankenstein work and this particular version that bundles in a number of essays. Relative to the possibilities, the 504 in LC’s MARC record is wanting:

“Includes bibliographical references (p. 335-336).”

Nor can you get a list of the titles and authors of the critical essays from the bib record. I’m not criticizing LC or the cataloger, cataloging rules and economic realities being what they are, but there’s an opportunity that I hope the cataloging and metadata community can work towards — not just focusing on the item in hand, but placing each work in the rich web of relationships of reference, homage, response, parody, and criticism. Barring a trek to strange and weird places to assemble a generous library board that can subsidize a week of effort to catalogue each complicated work, some notions:

  • A metadata record can never be done — even a completely analyzed MARC bib record does not sufficiently relate a rich work to its influencers and influencees.
  • Of course, at any given point in time it does have to be good enough to satisfy the users and those paying the bills.
  • Since no one cataloger can even begin to note all connections of one rich work to another, metadata culture must promote the easy enhancement of bibliographic records (or RDF triple-clouds, or whatever) by anybody qualified (and probably, anybody half-way qualified).
  • Bibliographic metadata must be linkable to other sources of metadata.

A final musing — the report of the LC Working Group on the Future of Bibliographic Control mentions the idea of speeding cataloging record production by getting basic metadata from the publishers. I have my reservations about whether publishers will be interested in fully cooperating with such a scheme, but suppose they do — would it be too much to ask to have them provide bibliographies in some kind of machine-readable format? I think this, all by itself, would be a big win for humanities researchers.

Closing in on Koha 3.0

May 4th, 2008 by Andrew Moore

Now that we’ve had a beta version of Koha 3.0 out for a little while now, there is some increased interest in getting a final version of Koha 3.0 put together soon. Paul recently started a discussion on the koha developers list about what we need to do to get a release out the door. This includes deciding on the last minute features we would like to include to make it a cohesive, useful product, what bugs absolutely need to be fixed, and the logistics involved in maintaining that version while we set our sights on the next version of Koha. I think that in the coming days and weeks we will see this discussion continue and a flurry of activity as we try to put some effort into finding the balance between completeness and timeliness

How Good is Google Book Services? Ask your mother.

April 29th, 2008 by atz

Despite not being even remotely Irish, my mother likes to make a traditional corned-beef and cabbage dinner for our family on St. Patrick’s Day, and this year was no exception. (Sorry, no pix.) My mother is a five-foot tall head reference librarian in a local public library system and she commands the type of incredible memory that you would expect from her profession. She makes use of this in another tradition, the singing of a particular St. Patrick’s Day hymn taught to her by nuns in grade school. Suffice to say, my mother has not been in grade school for a while, and for a song sung only once a year, it seems remarkable to me that anybody can remember all the words without any help. I mean, how far can you get into Auld Lang Syne or Good King Winceslas? If you say “all the way, no problem”, please remind yourself if you are currently, have even been, or are about to become a reference librarian.

So this song is essentially about St. Patrick and the persevering quality of Ireland’s faith in him, but pace is pretty quick and the lyrics are twisty and complicated. I’ve never heard this song anywhere else, and I didn’t know the title, but I wanted to find the lyrics. Since Google Book Services had just come out, I decided this would by my test. Searching on “St. Patrick” was never going to do it. The best I could do is remember the small, odd phrase “is bright with us yet.” In fact it was that phrase that made me want to revisit the lyrics in the first place.

GBS’s first hit was The Hymn-book of the Modern Church: Studies of Hymns and Hymn-writers (1905) by Arthur Edwin Gregory, a compendium of various hymns including two of the “Romanist” verses I was looking for, and a discussion of the author and his relative merits. Interestingly, it does not offer the title of the song, but Google’s results took me directly to the correct page so that immediately I was looking at the relevant content. Very impressive.

The second hit was even better: The Parochial Hymn Book (1897). 3 verses and a complete four-part vocal arrangement! The title, as it turns out, is “St. Patrick’s Day”, not the most helpful string to search against.

The specificity of the texts themselves is most remarkable. When the first rounds of “electronic texts” were circulated, many people were unimpressed with the experience of reading screens of flat ascii text, objecting to the sterile quality as not-bookish enough. The difference between that time and today is stark. Google’s text is not an abstract vision of the PHB’s content, rather it is photographic images from a particular book in a particular stack, with all the peculiarities of the physical original (save, perhaps, smell).

It has provenance: a hand-inked calligraphic block claims it for Andover-Harvard Theological Library of the Harvard Divinity School, with both a stamp and a bookplate noting that it comes from the estate of one Rev. Charles Hutchins on May 24th, 1939. You can even see Harvard’s call number penciled in on the verso page. The effect of these details is to greatly reinforce the validity of the text. Contrast this with a posting on any of a thousand interchangeable lyrics sites. Which would you regard as desireable or authoritative?

My experience was very much like those I suspect any fan of libraries has enjoyed, the feeling of discovering a tangible artifact of another time and place that was produced and preserved specifically so that you might encounter it, and have the information sought. Even 111 years later.

I chatted up my colleague Chris in New Zealand, one of the original Koha developers, about my GBS results. He was fairly impressed. For his test, he put in his father’s name, Ian Cormack, and was promptly returned as the first hit a link to his academic article “Creating an Effective Learning Environment for Maori Students” in Mai i Rangiatea: Maori Wellbeing and Development (1997). So there you have it: two texts separated by 100 years of time, and half the Earth in distance, accurately and immediately retrieved from one repository, for free.

Note: Google has built on to Book Services a bunch of other features, including ratings, reviews, tags and a My Library feature. In my My Library, you can see the three texts I mentioned. There is also a “Find this book in a library” link to OCLC’s WorldCat that tells me the closest (known) copy is 160 miles away.

For comparison, the my search at amazon was empty. Which suggests another question….

Paranoia Alert!! Assume your library had a copy of the same song in one hymnbook or another. Without GBS, based on my limited query data, how long would it take me to retrieve it at your library? I should add that my search began at 10:30PM.

Perhaps most striking is that GBS would be preferable even if I was sitting in Harvard’s library where the original still resides!