Archive for the ‘Amazon’ Category

How Good is Google Book Services? Ask your mother.

Tuesday, April 29th, 2008 by atz

Despite not being even remotely Irish, my mother likes to make a traditional corned-beef and cabbage dinner for our family on St. Patrick’s Day, and this year was no exception. (Sorry, no pix.) My mother is a five-foot tall head reference librarian in a local public library system and she commands the type of incredible memory that you would expect from her profession. She makes use of this in another tradition, the singing of a particular St. Patrick’s Day hymn taught to her by nuns in grade school. Suffice to say, my mother has not been in grade school for a while, and for a song sung only once a year, it seems remarkable to me that anybody can remember all the words without any help. I mean, how far can you get into Auld Lang Syne or Good King Winceslas? If you say “all the way, no problem”, please remind yourself if you are currently, have even been, or are about to become a reference librarian.

So this song is essentially about St. Patrick and the persevering quality of Ireland’s faith in him, but pace is pretty quick and the lyrics are twisty and complicated. I’ve never heard this song anywhere else, and I didn’t know the title, but I wanted to find the lyrics. Since Google Book Services had just come out, I decided this would by my test. Searching on “St. Patrick” was never going to do it. The best I could do is remember the small, odd phrase “is bright with us yet.” In fact it was that phrase that made me want to revisit the lyrics in the first place.

GBS’s first hit was The Hymn-book of the Modern Church: Studies of Hymns and Hymn-writers (1905) by Arthur Edwin Gregory, a compendium of various hymns including two of the “Romanist” verses I was looking for, and a discussion of the author and his relative merits. Interestingly, it does not offer the title of the song, but Google’s results took me directly to the correct page so that immediately I was looking at the relevant content. Very impressive.

The second hit was even better: The Parochial Hymn Book (1897). 3 verses and a complete four-part vocal arrangement! The title, as it turns out, is “St. Patrick’s Day”, not the most helpful string to search against.

The specificity of the texts themselves is most remarkable. When the first rounds of “electronic texts” were circulated, many people were unimpressed with the experience of reading screens of flat ascii text, objecting to the sterile quality as not-bookish enough. The difference between that time and today is stark. Google’s text is not an abstract vision of the PHB’s content, rather it is photographic images from a particular book in a particular stack, with all the peculiarities of the physical original (save, perhaps, smell).

It has provenance: a hand-inked calligraphic block claims it for Andover-Harvard Theological Library of the Harvard Divinity School, with both a stamp and a bookplate noting that it comes from the estate of one Rev. Charles Hutchins on May 24th, 1939. You can even see Harvard’s call number penciled in on the verso page. The effect of these details is to greatly reinforce the validity of the text. Contrast this with a posting on any of a thousand interchangeable lyrics sites. Which would you regard as desireable or authoritative?

My experience was very much like those I suspect any fan of libraries has enjoyed, the feeling of discovering a tangible artifact of another time and place that was produced and preserved specifically so that you might encounter it, and have the information sought. Even 111 years later.

I chatted up my colleague Chris in New Zealand, one of the original Koha developers, about my GBS results. He was fairly impressed. For his test, he put in his father’s name, Ian Cormack, and was promptly returned as the first hit a link to his academic article “Creating an Effective Learning Environment for Maori Students” in Mai i Rangiatea: Maori Wellbeing and Development (1997). So there you have it: two texts separated by 100 years of time, and half the Earth in distance, accurately and immediately retrieved from one repository, for free.

Note: Google has built on to Book Services a bunch of other features, including ratings, reviews, tags and a My Library feature. In my My Library, you can see the three texts I mentioned. There is also a “Find this book in a library” link to OCLC’s WorldCat that tells me the closest (known) copy is 160 miles away.

For comparison, the my search at amazon was empty. Which suggests another question….

Paranoia Alert!! Assume your library had a copy of the same song in one hymnbook or another. Without GBS, based on my limited query data, how long would it take me to retrieve it at your library? I should add that my search began at 10:30PM.

Perhaps most striking is that GBS would be preferable even if I was sitting in Harvard’s library where the original still resides!

Koha, the bacon donut ILS.

Monday, March 31st, 2008 by atz

 Last month in Portland, Oregon at the code4lib 2008 convention (a most impressive assemblage of library geekiness), a few of us broke out to the 24-hour bakery Voodoo Doughnut for this:

Bacon Donut by Voodoo

A donut, with bacon on it. A bacon donut.

I’d expected it to be strange, but the remarkable thing about the bacon donut is how unsurprising the taste is. The sweet maple and salty flavors are, as it turns out, very compatible. So it strikes me that the work I’ve been doing on Koha recently is a lot like the bacon donut: take two things people already like, we do the voodoo and make them work together in a new way.

For the OPAC, the place where this comes up most often is external content, like book cover images. Koha libraries have been using jacket images from Amazon for some time in production, internationally. It’s free and it’s broadly populated: a great feature, especially for small libraries who don’t have the advantage of a lot of subscription content services. Using their API, we can also pull and display content like user reviews, really fleshing out OPAC content.

I recently completed some commissioned Koha code for integrating Baker & Taylor images and content as an alternative to Amazon. Koha can now link to B&T ContentCafe excerpts, ratings, etc. and to their MyLibrary BookStore retail site. For design, my code followed the Amazon model, and certainly something similar could be crafted for other proprietary sources like Blackwell, Syndetics, etc. But upon reflection, I think that the entire model is already on it’s way out!

Enter Google Book Services. I’ll have more to say about GBS later, but suffice to say we now have a second, very widely available source of free book jacket images. (In fact, it may be enough to deflect calls some have been making for the Library of Congress to provide access to cover images like they do for other metadata.) The Google API is essentially javascript based and remarkably easy to integrate. How easy? Code4lib members were posting working example code back and forth within hours, and then within a day or two, other Koha users adapted their own servers to start using Google’s images. This is a great example of how OSS enables agility and adaptability.

So pretty soon we should expect that every current OPAC will have some images from somewhere, and that won’t be a distinguishing feature anymore. The next model to evolve will be to allow ajaxy failover from a ranked menu of many possible image sources (both free and subscription/keyed like B&T/syndetics). In fact, several coders have reported implementing this for their favorite sources already! I’m looking forward to seeing this code synthesized, providing the broadest possible coverage for images. Then we can start to get some abstraction around the other data in common, like reviews, ratings, etc.

Some of my colleagues have already started on LibraryThing and xISBN. If you have other external data sources you would like to see integrated in Koha, feel free to mention them here!

Amazon.com Web Services and Library Catalogs

Monday, April 16th, 2007 by Joshua Ferraro

Over the past few years, since I wrote the original Amazon.com module for Koha, I’ve received literally hundreds of complaints, mostly from librarians, about the legality of Koha’s use of Amazon.com’s Web Services. In fact, it’s fair to say I’ve spent considerably more time responding to these questions than I did writing the original module.

So … first of all, shocking as it may seem, Koha has the capability to use Amazon.com content in the OPAC search results and detail pages. To see this in action, feel free to visit the Athens Public Library’s OPAC:

http://search.athenscounty.lib.oh.us

It’s perfectly legal to aggregate the content in web applications such as Koha. In fact, Amazon.com expressly created the web services program so that people would write applications around it. Their business angle is no different than any other content provider — they expect to make money. The difference is that they don’t want to make the money from the people aggregating the content. Instead, they are hoping that the content will drive users to the Amazon.com website and that those users will purchase something.

If you have hesitations about this business model and don’t think your library should be involved in it, no problem, you can simply turn it off in your Koha installation and purchase similar services from other content providers with more traditional compensation methods. The Koha community is not trying to force you to use Amazon.com.

However, if, like many of the libraries that LibLime supports, you are on a tight budget, yet want to provide your patrons with this content, Amazon.com’s alternative service model gives you that ability. Here’s how it works and why it’s legal.

Let me preface this by adding that I’ve had extensive conversations with Amazon.com’s US legal department about Koha’s use of Web Services, and they have confirmed that Koha does not violate the terms of their agreement. This point is worth making: they want your library to use their content :-).

First off, a bit of background on Amazon.com’s Web Services Program. The basic idea is that Amazon provides machine-readable access to content they have for sale. That content is indexed by ISBN number, which makes it trivial to identify a relationship between an item in a library catalog and an item on Amazon.com. Web Services data includes:

  • Item Jacket Cover Images;
  • Item reviews by Amazon.com patrons;
  • Item ratings by Amazon.com patrons;
  • Professionally written item descriptions and reviews.

Koha’s Amazon module can interact with Amazon.com’s web services program in several possible ways, in accordance with the license agreement that every Web Services user must abide by:

  • Koha can be configured to periodically download content en masse and locally cache the content on one of your library services and serve it to your users via the OPAC;
  • Koha can download the content in real-time as a search result set or detail page is loaded.

The Web Services agreement has very specific requirements about usage and discusses both of these methods in great detail. The most relevant points to this discussion are:

  • if content is cached locally, it must be updated every 24 hours;
  • if you download in real-time, you can only download up to 1000 items per IP address per day;
  • if you download in real-time, you cannot download more than one item per second per IP address.
  • if you use their content, you must provide a link back to any Amazon.com page

Since Koha’s system supports both caching and real-time downloads of the content, based on a library’s usage patterns, they would need to determine which method or combination of methods would work best for their situation. Keep in mind that images are downloaded from the user’s browser, not from the Koha application, so the 1000 queries per day per IP address and 1 download per second rules don’t apply to the Koha server(s).

If a library didn’t want to cache data locally, yet had more than 1000 views of their detail pages, it would be very trivial to simply track the number of times that Amazon.com content was syndicated, and turn it off after the day’s cap. It would be similarly trivial to keep track of the number of queries to detail pages per second and only permit one per second; or to use javascript to download the content from the browser rather than the server.

So the bottom line is that it’s not at all difficult to use Amazon’s program without abusing it. It’s up to each library to make an informed decision about whether and how to use it.