March 20, 2009, [MD]
I've long been fascinated by book scanning projects, and written before about OpenLibrary and Universal Library, as well as Google Books. However, as neat as these projects are, we shouldn't forget sites like Project Gutenberg, which have been around for much longer. Project Gutenberg relies on volunteers to scan, OCR and proof-read texts that have fallen into the public domain, with the goal of creating high quality ASCII text versions of the books. This is different from the large scale book-scanning projects mentioned earlier, which rely on institutional investment, and mainly focus on making the scanned images available. Sometimes, you can also get the OCRed text, but it has not been proof-read or quality controlled.
While it's very neat to see pictures of old book pages, especially ones with elaborate illustrations, ASCII (or increasingly Unicode) text has many advantages. It can be searched and text-mined, copied and pasted into any format, and displayed on any kind of device - whether it's a text-to-speech device, an ebook reader, a cell phone, an eBook, or what have you. There are also several websites (like manybooks.net) which repurpose these text files, for example by formatting them nicely, converting them to different ebook-reader formats, etc.
I hadn't visited Project Gutenberg in years, but a friend of mine showed me that they have kept adding books, and in fact radically expanded their international offering. All of these languages have more than 50 titles available: Chinese Dutch English Esperanto Finnish French German Italian Latin Portuguese Spanish Swedish Tagalog, and there are many others that are represented with a few books.
In addition, my friend showed me a few other sites that were in the same vein. lib.ru has a huge collection of literature in Russian, in HTML format. You'll find your Dostoyevski and Gogol, but also Russian translations of Chinese classics, as well as Soviet-era science fiction! And in Japanese, Aozora has a huge collection of public domain works. And China is amazing at making material available, not caring too much about copyright - there is a plethora of sites where you can easily view the full text of any modern novel in HTML - books.sina.com is an example.
These resources are extremely useful, not least for me who is a learner (and I can't even speak Japanese, but perhaps one day). It means that I can download a Chinese book, and put it in Wenlin, or use an electronic mouse-over dictionary for the Russian texts. I'd love to hear about interesting book collections in other languages.
And then you have LibriVox - a collective of people who record public domain books, and release the audio books as public domain. I've long been impressed by their work, and listened to a number of their books, and my friend pointed out that they've now got more foreign content than ever. Their multilingual poetry collections are especially interesting.Stian Håklev March 20, 2009 Toronto, Canada comments powered by Disqus