What28099s happening with OpenLibrary and OCA

November 9, 2008, [MD]

Image via Wikipedia

I have previously written about the OpenLibrary, both how much I liked their interface, and how it frustrated me that they didn’t communicate better to the public about their progress. Recently, aworkshop was held in San Francisco, “to share progress and plans for continued growth of open web access to digital books”. Two of the outcomes reported were initiatives to provde print-on-demand for some of the public domain books (through Hewlett Packard) and scan-on-demand for books at Boston Public Library (and perhaps more to come).

Very little was reported on how the general growth of the collection was proceeding, but when I visited their website again, I was surprised to see that the number of full text books had increased from what I remember to be around 400,000 books, to 1,064,822! That is a huge increase, and what is unclear, since there is no information about this anywhere, is whether they suddenly imported a huge amount of books that had been lying around waiting for processing, or whether they have been continually adding books, without updating this number. I guess the future will show if this number will change on a daily basis or not.

While this is great, there seem to be some quality issues. First of all, many of the books I searched would not display the full text - when clicking on the link, nothing came up. Worse than that however, is that some scans seem to be very bad, to the point of useless. An egregious example that shows several weird artefacts is here: Dansk biografisk lexikon. This book - which was the second or third link that I clicked on, I did not go hunting for this example - is really strange. If you begin reading through it, the first you see is a Copyright page for Google Books! Weird, I did not know that OCA included material from Google Books. The next few pages have pictures of a hand wearing a glove - this book scanner has been immortalized. And finally, once you start reading, there is what seems to be a curious mix of two different scans - one seems like it’s a Google Books scan, with the bright white background, and other an OCA scan with the yellowish background.

Interestingly, when you go to the Internet Archive detail page (which is of course separate from the OpenLibrary archive page), you find this information: Book digitized by Google and uploaded to the Internet Archive by user tpb.

Color me confused. I think this effort to make books available online is wonderful, but I fail to understand why they are not more participatory about it. On the OpenLibrary website, I cannot even flag a scan as faulty. And it is much harder for me to promote OpenLibrary when I speak about open access and open education, because I know so little about it. This seems to me to be shooting oneself in the foot.

(And we still really need to be able to scan in on pages in the flipbook view. The interface is great, but hasn’t been updated for a long time).


Stian Håklev November 9, 2008 Toronto, Canada
comments powered by Disqus