Tweets from OAI6 on Open Access in Geneva

June 19th, 2009

So my European travels are drawing towards an end. Lot’s of new impressions to digest, both from the travel, and from ElPub 2009 and OAI6 which I attended. Tomorrow morning I will try to catch a few hours of HackMeeting in Milan, before I fly back to Beijing, spending a few days in Dubai on the way.

I will try to post more about my experiences, but in the meantime I am posting my tweets from OAI6 as an experiment. Not sure if seeing them like this gives any value, but I thought I’d give it a try. (I simply searched twitter for “#oai6 from:houshuang“, copied it to TextMate, and used a few regexps to clean it up. I then put it in a spreadsheet, to reverse the order so the oldest tweets come first.

  • In Zurich with friend. Geneva and #oai6 on Wednesday. Then Milan and a day of #hackmeeting, #dubai, and back to #beijing
  • In Geneva, great time with awesome #couchsurfing host. #oai6 tomorrow.
  • #oai6 Discussion about whether/why institutional repositories should be aiming for 100% coverage of publications in tut 2.
  • #oai6 CERN harvests publications by own researchers from 90 repositories, makes up 50% of repository. Total coverage rate 80%. (tut2)
  • Thinking abt workshop, interactivity… “open exercises” where facilitator knows exactly what she is looking for. Opp. to open-ended. #oai6
  • @icutler Another Norwegian at #oai6? Greetings. Hope to run into you non-virtually as well.
  • Herbert Van de Sompel gave amazingly inspiring intro talk - great opening to the conference that has me fired up. Lot’s to think abt. #oai6
  • How long until universities go from specifying font sizes and margins of PhD theses, to specifying which dtd to use? #oai6
  • With all this new XML, semantic data, etc - where do IRs come in? Which either take pre-prints (.doc/.pdf) or publishers pdf? #oai6
  • Any IRs that are requesting publishers XML files, as opposed to PDFs? #oai6
  • #oai6 Interesting talk, but I just listened to it verbatim fiv (#elpub09). (Houghton)
  • @iamtimmo It was an amazing morning session that really has my inspired and my neurons buzzing. Congrats! #oai6
  • Back from a great alcohol-sharing meet hosted at CERN #oai6. Location didn’t add 2 much, but met some great peeps. Look frwrd 2 reconn 2mrw
  • Nature Publishing Group to begin allowing academic text mining of their articles. Interesting to see what ppl will come up with. #oai6
  • NPG launching Nature Communications, publishing from April next year - “new paradigm”, authors choose subscription or OA, deriv or no. #oai6
  • I still need to think abt this more deeply… “when will repositories ask for publishers’ XMLs, rather than PDFs”? Inspired by portico #oai6
  • #oai6 Breakout group on future of scientific communication: one of the most intelligent, polite and pleasant exchanges I’ve participated in.
  • #oai6 All conferenced out, went for some pad thai and a Swedish movie with German/French subtitles. Nice with a complete change ;)
  • #oai6 Thomson Reuters presentation on ResearcherID. This is sorely needed - let’s see if their solution is the best.
  • #oai6 ResearcherID… this is really needed, not convinced Thomson are the right people. It’ll be messy until the dust settles.
  • RT @epoz: MESUR Project. In a league of its own. Have seen pres. on it before, but I am still blown away. #oai6 (I agree!)
  • RT @azaroth42: http://www.mesur.org/services/ MESUR services site #oai6
  • Atmospheric Chemistry and Physics: open peer review can still be anonymous. Good point. I think OA is more important than ID. #oai6
  • #oai6 Poster prize to be announced. Tension rises
  • #oai6 E-LIS won best poster… Very unclear about how one evaluates posters, but E-LIS is a great project so kudos to them!
  • #oai6 That’s that for this year. Tons of new ideas and thoughts. See you all in two years (or sooner)! Arrivederci.
  • RT @frumiousMimsy: overheard coming from some U of Geneva students walking by the posters at #oai6 : “this is for really intelligent people”

Stian

Similar posts that might interest you:

On critically engaging with other people’s writing

June 9th, 2009

I just finished a blog post about Sean Duncan’s PhD dissertation about reuse of learning objects, which was quite critical. And I asked myself for a second whether I should publish it or not. Would it make anyone upset (him, his supervisors)? Was it aggressive or uneccessary? I don’t think so. I think part of the reasons I felt uneasy with publishing this, is because I am so used to “haters” and “trolls” on the internet, who will obsessively debate and criticize everything. In addition, while the fundament of academia is critical engagement and debate, there is certainly enough unproductive “attacks”, personal disputes and other issues that don’t serve to promote scholarship.

However, just as Cory Doctorow is more afraid of obscurity than ebook-piracy, I think that obscurity or indifference is far worse than criticism in academia. If you have worked for years on a product, and it just ends up on a shelf in some library, never to be seen again… And I find this kind of thing happens frequently, at all levels. For example, I attended the Comparative, International Education Society annual meeting last year in Charleston, SC. This is a huge conference, with over 900 presentations given on all topics related to comparative education. Unfortunately, many of the presentations were extremely poor, but I felt that there was very little real engagement with the content, by the audience. Partly, this was probably due to the lack of time (often four 15 minute presentations, followed by just 20 minutes of discussion).

The best session I attended was the presentation of a large project presented by several famous scholars, who had invited a who’s who of famous education scholars to read drafts of their presentations before-hand, and give brief comments. One after another, they went up to the podium, and give extremely incisive and critical comments, with lot’s of substance and deep insight. In a way, the presenters were being ripped to pieces in all kinds of ways, in another way they got the incredible privilege of having some of the best minds in the world engage deeply with their content. And I also wonder whether the fact that the presenters were also so established, and that everyone knew each other, was one reason why they could be so outspoken… they probably would not have done that to graduate students.

Thinking about myself, I finished my undergraduate honor’s thesis last year on community libraries in Indonesia. It was quite a novel topic, and I was very excited to put it together. I even had it translated to Indonesian, because I believed it was very important that those who were talked about, could also read (and criticize!) the paper. However, although I know that a number of people downloaded the PDF (and mysteriously, it even appeared on the shelves of the Australian National Library), I have yet to receive any kind of substantial comment or criticism about the paper (which is very far from perfect!).

Even blog posts are similar. I always wondered how some bloggers get so many comments on their posts. I know through statistics that there is a nice number of people visiting this blog, many finding it through Google queries, others reading it through feed-readers, etc. Yet, I get very few comments, and even when my blog posts are republished or linked to otherwise, it’s usually just in a “this is neat” way. (There are exceptions, for example Downes did give me some nice resistance about the role of theories in OER research).

And added thing that sometimes strikes me is the whole hierarchy. What business do I have, as a first year MA student, to criticize the field of OER research, or a PhD thesis? However, as long as you do it in a respectful and sincere manner, I think it is an important part of learning. I never pretend that what I say is the final truth, it is what I think. If it is completely wrong, then better that I say it out, and be corrected. For example, I mentioned that I thought the 45 page PhD thesis was very short. Perhaps this is quite normal, and I will be corrected. Great. Then I will have learnt something.

I gave a talk two weeks ago for a Chinese community of people interested in open education and elearning, and one of the things I talked about was the English-language open ed blogosphere. I might have romanticized it a little bit, but I still believe it’s one of the most helpful and constructive “communities” or networks that I’ve ever engaged with. And one of the elements is that what you say really is much more important than who you are. Even if I am a first year MA student, if I have something interesting to say, people will read it. And although our education system is far from perfect, I treasure enormously the self-confidence of my teachers, almos consistently from primary school until my current MA, who welcome criticism and questioning, and encourage you to not accept things at face value.

So I will continue poking my head out, but it works both ways - my stuff is out there, and I would love for people to tear it apart. Then I’ll know, at least, that somebody’s read it.

Stian

Similar posts that might interest you:

PhD thesis on learning object reuse, and some ponderings

June 9th, 2009

I’m stuck at Dubai airport for 12 hours, coming from Beijing, and transiting to Milan, where I will be attending ElPub 2009 (and later OAI6 in Geneva). I could go downtown, but I’ll be spending four days in Dubai on the way back, so it didn’t seem worth it. The airport isn’t that exciting, but it does have free wifi and occasional power-outlets, which is a huge plus.

Via David Wiley’s blog, I came across Sean Duncan’s just completed dissertation on the (lack of) reuse of learning objects, where he did a study of Connexions. I grabbed the .doc from Archive.org, and since I didn’t have any good movies left on my laptop, I settled down to read it in the airport.

Digression about document formats
A little digression here about document formats. For things that I am just going to be reading on screen, I find it annoying to download large .doc or .odt files. I seldom have Word of OO.org open, because I rarely use them (preferring tools like Scrivener for authoring, and OO.org for the final polish). Just starting Word slows my computer down, let alone opening a several-hundred page Word document. In these cases, when I am not expected to be editing the document, I much prefer PDFs. In fact, the first thing I did was to open the document in Word, convert it to PDF, open it in my PDF reader, and close Word. My PDF reader quickly opens many hundred page-PDF documents without even hesitating.

On the other hand, PDFs are only good for one thing - seeing the document on the screen (or printing it), exactly as it were. Which excludes two important things: one is reformatting the document for nicer vieweing (or use on other devices), and the other is reuse. For example, I hate reading double-spaced documents (1.5 line spacing is tolerable), but in a PDF, there is no simple way to change this (in Word, it’s extremely easy). I also remember borrowing my friend’s CyBook for a week, and got very frustrated when trying to read the book Opening up education on it, which was released as a Creative Commons book, but only in PDF. Trying to display PDFs directly on the ebook is possible, but extremely awkward. Trying to extract the text gives poor results, with mixing of columns etc.

And of course, for reuse, the situation is even worse… What’s the point of applying a Creative Commons license that allows for reuse, if the text is “trapped” in a PDF, especially with a multi-column layout? And I realize that I do exactly this myself. I try to put many of my papers, reports, etc. online, with an open license. However, I always upload a PDF… So what’s the alternative? .Doc and .odt? Both? I thought the idea of embedding an ODT into a “hybrid PDF” sounded quite promising. Could we do the same, but embed an XML? DocBook? Markdown?

Back to the dissertation
So back to the actual contents. I have been very interested in the question of reuse (or even use) for a long time, since it often seemed like the open ed community was more focused on production than anything else, and especially reuse seemed to be something that most people wanted, but that was not very successful. Thus, Sean’s topic is extremely timely.

He starts out with a literature review, done in an interesting fashion. He uses a quite explicit methodology for gathering data: “In the literature search for this study, the original search string [(SU "learning objects" or KW "learning objects") and (TX reuse OR reusable OR reusing OR "re using")], in the Academic Search Premier, Psychology and Behavioral Sciences Collection, and PsychINFO databases, resulted in 43 records.”

He then drills further down, discarding non-peer reviewed papers and other non-relevant papers, and ends up with reviewing 25 articles in depth. I am curious about why he choose only these databases, and did not expand his search wider. For example, I ran the search “learning objects” AND (”reuse” OR “reusable” OR “reusing”) in Google Scholar, and it returned 9,280 hits. Granted, most of these will likely not be usable, but it seems probably that it would turn out more than 25 that were relevant to the topic in question.

Where I really get confused however, is in the actual methodology and data collection part. He chooses to limit his study to Connexions, which has the advantage that all its content uses a free license, the system has built-in support for translation, modification and collaboration on modules, and the usage data is openly available. Modules in Connexions are basically articles, single text pieces, which can incorporate graphics etc. Collections can contain many modules.

Sean defines the inclusion in a collection as use, in several collections as reuse, and also talks about translation and modification. He doesn’t make a very strong case for why this would be valid, however. The first question is how Connexions is actually used - is it used mainly by self-learners, who wish to find useful material for their own studying? Or who want to study an entire “collection”? Or is it used more by educators, who “pre-package” content for their students, into collections? Or are the collections made by some self-learners, who package stuff they find neat for other possible future users?

I don’t have the answer, but it seems like we would have to know a lot more about how Connexions is being used, to see if this mapping of statistical indicators to conceptual ideas “work”… Sean has not referred to any of the literature on Connexions, although there are several articles out there. One weakness for example, is that he does not take into consideration reuse of specific modules from outside of Connexions. A simple example is this, where a university curriculum explicitly refers to a specific Connexions module. The way I found this was through the Google search link:cnx.org/content -site:cnx.org.

How long should a PhD dissertation be?
One of the things that struck me when opening the file, was how short the dissertation was. The PDF is 74 pages, and at first I thought that maybe it was single-spaced. But no, it is double-spaced, and in fact, there are a lot of appendices, so if we only count the text before the reference list, there’s only 45 double-spaced pages in this PhD dissertation. Immediately that seemed very little to me. It might sound very superficial to focus on such an indicator, but it is striking, because it is so different… I recently looked over some literature from the Australian National University, because my wife was thinking about attending, and in their guidelines, they specify that a PhD dissertation should be between 80,000-100,000 words. According to Wolfram Alpha, that gives about 400 double-spaced pages, ie. 10 times as much.

Part of the reason that I am point this out, is that I think that this dissertation would have been much stronger if it had had a wider literature review, and perhaps even a design that combined qualitative and quantitative methods. It could first use exploratory qualitative methods to understand what constitutes reuse in the context of Connexions, and then use statistics to gauge the extent to which that is actually happening. It would also be great to look at, or even test, some different theories about why this might be the case or not. As it stands, I feel like the conclusion is not strong enough - it’s telling us that there isn’t that much reuse of learning objects within Connexions, but we’re not sure why.

All in all, it’s an important topic, and it’s great to see people picking up the gauntlet. I also know myself how hard it can be to venture into a research area that hasn’t been much explored. I congratulate Sean on his achievements and hope he takes my thoughts in good spirit.

Stian

Similar posts that might interest you:

Talk in Chinese at SocialLearnlab: Social Learning

June 4th, 2009

SocialLearnLab (or 教育大发现, which is their Chinese name) is a unique online community of students, professors and teachers interested in online education, Web 2.0 and open education. Initiated by Beijing Normal University professor Zhuang Xiaoli (庄秀丽), they run several very active mailing lists, wikis, and use a number of Chinese and international social networking apps. I am hoping do to an interview with professor Zhuang later, to talk more about the background for this organization. Anyway, while in Beijing I hoped to be able to meet up with people interested in these topics, and we managed to organize a “salon”, with participants from several Beijing universities, as well as many students from Beijing Normal. I began by giving a “short” talk (I planned it for half an hour, but it became an hour), and then we had time after that to discuss and share.

I posted the talk on Slideshare, with audio synched to the slides. This is my third talk that I have given in China, and they are all different (although some issues recur). In this case, I assumed that many of the people would be quite familiar with the basic issues, and I also knew from the mailing list that several had watched my earlier presentation at CMU, so I decided to discuss some more detailed issues. I spent some time on talking about the progress in setting up Peer2Peer University, and also discussed the open education blogosphere as an epistemic community, and compared my view of the Chinese and English-language “spheres”. It turned out that many of the participants were not that familiar with many of the concepts however, and we spent some time after my talk to discuss things like open access, and open licenses, more in detail.

Stian

Similar posts that might interest you:

New talk in Chinese: Understanding the meaning of open education, expanding the definition of OER

May 25th, 2009

I was kindly invited by Professor Chang to give a presentation to the department of education at Minzu University of China (formerly called Central University of Nationalities). I never like to give the same talk twice, so although I reused some of the materials from my talk at South China Normal, I redesigned the talk quite a bit. I decided to focus more on Open Access, since I think it’s so important in China, and I also introduced Ivan Illich and deschooling society. In a country that is obsessed by formal education, and is currently undergoing probably the most rapid expansion of higher education that the world has ever seen, this might be seen as quite subversive.

However, I was not on my pulpit preaching, rather I think his texts give us a lot of food for thought. And the fact that it was written in the 1970’s reinforces my belief that the really difficult part with online learning - especially collaborative - is the “software”, ie the social practices and structures, rather than the “hardware”, ie. the technology.

I got a fair number of good questions from the students after the talk. The talk went on for a bit over an hour, and we spent maybe an hour more discussing, after a small break. It’s great to have that much time available to really probe something in-depth.

In addition to Illich (whose books are available in full-text, but sadly not the Chinese translation), I’ve been reading up on different sociological theories on the value of schooling lately for a paper I was co-authoring, and found much that is extremely relevant to open education (especially in its more radical intonations). I need to spend more time thinking through this later, and maybe it could be come a paper in itself.

The entire talk (but not the subsequent discussion) is available below in audio, synced with slides. You can also directly download the audio (mp3, ogg).

Stian

Similar posts that might interest you:

Cantonese the Movie: Killer Tattoo Death Dragon Master Black Hand Massacre Misunderstanding

April 27th, 2009

Yes, that is the unlikely title of a short film that I collaborated in recording in Hong Kong two weeks ago. To find out why, we have to go way back in time. One day in Toronto, my friend told me about this crazy Norwegian lady who was teaching Cantonese in Hong Kong. We went on Youtube and found several interviews of Cecilie, even a documentary, and saw that she was indeed really fluent in Cantonese, and also quite feisty.

So when I was in HK for a few days this summer, on a whim I sent her an email and said, fellow Norwegian in town, would you care to meet up? And she got back to me saying, we’re shooting a movie, care to dress up like a gangster and act ridiculous? Of course, that’s one of my favorite things to do, so I was instantly in. The next two days, a bunch of gwailo spent hours practicing their Cantonese lines (I speak Mandarin, but not Cantonese - the others were better, but none as fluent as Cecilie), and running around in back alleys with big whigs and ketchup wounds, to the joy and befuddlement of the Hong Kong seniors sitting on benches and watching. The result is now available for your enjoyment:


Cantonese - The Movie, Episode 19: Killer Tattoo Death Dragon Master Black Hand Massacre Misunderstanding from Cecilie Gamst Berg on Vimeo.

(can also be found on Youtube together with her previous movies)

Might contain coarse language, gratuitious violence, and very poor disguises, viewer discretion is advised)

Stian

Similar posts that might interest you:

Open Education - lecture in Chinese at South China Normal University

April 27th, 2009

This summer, I am spending four months in China, and much of the time will be taken up with my research project (for my MA) on Chinese “OpenCourseWare” (I am gradually realizing that this is not an apt English translation of their program, but I need time to come up with a better one). In addition to my specific research plan, which includes interviewing people at three different universities, and someone at the Ministry of Education, I am very interested in meeting up with people knowledgeable about this area, and doing research on it.

Coming up from Hong Kong to Beijing, I spent a few days at South China Normal University, where my advisor in Toronto had introduced me to some of his colleagues. They have a very strong group there in the department for educational technology, and a centre whose name I really love: Future Education Research Centre (in Chinese). Not only were they some of the first to develop “OpenCourseWare” in China, but they also hold several research grants to research Chinese OpenCourseWare, one that compares it to OER in other countries, and one that looks at how to promote sustainability and reuse of OER.

In addition, the centre meets every Thursday night for a seminar, and that week, I was invited to give a presentation about my own research, and some of my initial thoughts about the Chinese OCW situation. This was my first time to give a formal presentation in Chinese, which was both exciting and daunting. I was very happy about the opportunity, because it is a skill I wish to develop: One of my future dreams is to be able to teach in China - in Chinese. I know I’m not there yet, but practice makes master.

I used a lot of the material from the presentation I gave at OISE with Jim Slotta, so the links page for that talk would be useful for this talk as well.

The slides for the talk are on Slideshare, you might want to watch them while listening, since it’s a very visual talk - sorry they are not synchronized. Also, some of the first few minutes of the video is missing, and the image has a lot of artifacts at first, but that clears up. Thank you very much to Jia Yimin, Zhao Jianhua, Jiao Jianli for inviting me, and for all the other professors and students for engaging with me in such a great way. And thank you for capturing the talk on video, and sharing the file with me!

In case the embeds don’t work, here are direct links to part 1 and part 2.

Part I:

Part II:

One of the things I say towards the end, is that I think there is an unprecedented opportunity for the educational research community in China to make a contribution both nationally, and internationally, when it comes to open education research. I have in the past called for the use of more theory when researching OER, and part of the reason why that doesn’t happen, is that many who are attending open education conferences in North America are not from schools of education, or have backgrounds in relevant fields. They are doing incredibly important work, and creating great innovations, but I keep wishing that more people within the field of education would begin studying this area (which is also why I this year presented on OER at CIES, and at the OISE Dean’s Conference).

However, in China there is a huge amount of research on this happening at schools of education around the country. There are entire research centers focusing on Chinese OpenCourseWare, there’s a journal called Open Education (in Chinese). In addition, China has a very strong background in distance education, with it’s massive TV and Radio University system. That is not to say that all the research that is currently published is excellent - but I believe the potential is there. I hope more Chinese will reach out, translate their articles into English, write English blogs… but I also hope that the international community respect and support that. Provide travel grants for Chinese researchers to go to conferences, show an interest in their research and findings, look at the massive amount of OER available in Chinese, and consider translating some of this to English and use it with their own students, etc.

Stian

Similar posts that might interest you:

Digitized books on Aceh - but are they accessible to Acehnese?

April 13th, 2009

From Klaus Graf, via Open Access News, The Royal Institute of Southeast Asian and Caribbean Studies (KITLV) in Leiden has digitized more than 656 books in their collection about Aceh, in several languages (Indonesian, French, Dutch, etc). Their website, Aceh Books. This is exciting news for me, since I have found it very hard to access Indonesian language books after returning from Jakarta. I have often wished that there were some serious book scanning projects started in Indonesia, because the country has a history of some excellent fiction, and of course there are many important social science and other non-fiction books as well, that are crucial to understand the history and culture of the country. However, these often exist in very few copies, so that even Indonesians in one part of Indonesia cannot access them, unless they travel thousands of kilometers to Jakarta or other centers.

However, I am a bit concerned about how accessible these books are to Indonesians themselves. Part of the reason is that many libraries in Aceh were destroyed during the tsunami, and ideally people in Aceh would thus be able to access and read the books made available. However, currently they are only available as PDF-scans, the two books I tried were 60-80 MB downloads each. Having worked in Jakarta, I know that even in the best of circumstances, download speeds are extremely slow. In Jakarta, we spent a lot of money on a dedicated satellite connection, but it was still not fast - in the regional office in Tangerang, we shared an ISDN connection or similar, and even downloading e-mail was painfully slow.

There are already good interfaces for presenting scanned books online - I’ve previously written about Open Library (1, 2, 3), and their interface, which only shows one page at a time, might be a good option (also, it would enable people to find the books more easily - rather than having to know about a specific page for Aceh-related books). Another possibility would be to upload them to a place like Scribd as well, which “streams” PDFs. I haven’t tested Scribd in a low-bandwidth environment, but I am assuming it would be far preferable to having to download 60MB. Of course, ideally, the books would be OCRed, and the text corrected by volunteers (or perhaps KITLV could pay some students in Aceh - Indonesian salaries are low), so that one could instead distribute text files weighing perhaps half a megabyte. (These can also be put on e-book readers, which would be important for me, if I wanted to read one of these in the future).

It would also be nice if KITLV could specify the copyright conditions of these books - some of them are old enough to be public domain, but some seem newer - does KITLV have special agreements? Am I allowed to upload them to Scribd, or do other things with them?

All in all, a great beginning, but hopefully these are questions they will think through - in collaboration with people on the ground in Aceh, who know far more than I do about what they need! I also keep hoping that some rich Indonesian (there are certainly enough of them!) will want their name immortalized through funding a large-scale book scanning project of the Indonesian heritage.

Stian (in Hong Kong, going to Guangzhou and South China Normal University today)

Similar posts that might interest you:

Language representation among DOAJ Open Access journals

April 5th, 2009

I am writing a paper about encouraging undergraduate students to conduct research in their mother tongues / other non-English languages that they know. One key element is of course availability — University of Toronto students are lucky to have access to a large repository of foreign online journals, but this might not be the case everywhere, so Open Access can play an important role. Curious about this, and frustrated about the difficulty in finding good numbers about the size of the “academosphere” in different languages (how many journals are published in Spanish? In Farsi?), I decided to have a look at the Directory of Open Access Journals data.

Unfortunately, they don’t provide an option to search based on language, but luckily they allow you to download their entire database of journal metadata as a comma-separated file. They have one field for journal language, but often there are several languages listed, so simply sorting them and counting in OpenOffice.org is not good enough. I whipped up a quick Ruby script, reusing a few lines from my previous script to count the most frequent search-words used with my online Chinese-English dictionary, and got the following list:

English 3309
Spanish 871
Portuguese 472
French 338
German 202
Italian 114
Turkish 60
Croatian 46
Russian 45
Catalan 45
Japanese 26
Polish 24
Chinese 19
Romanian 14
Norwegian 13
Swedish 13
Czech 13
Serbian 11
Persian 10

In total there were 4010 journals listed, but note that journals that have articles in several languages are double-counted, so a journal with articles in French and English would be counted as one journal for French, and one for English. (There are a total of 74 languages represented, the full list is here).

I am assuming (hoping) that there are more than 19 open access journals in Chinese, for example, but on the other hand, there might not be a strong incentive to be listed on an English-language only website, which does not even allow for searching/sorting by language. The data would have been better if we could have looked at the distribution at an article level, because some of the journals which list several languages are overwhelmingly published in only one of them — however, only a portion of the journals have article data, and the article metadata does not contain a field for language (unlike journal metadata). I wonder if it would be feasible to run the titles through a language recognition library, but that has to be wait for another rainy day.

Stian

Similar posts that might interest you:

Public domain books in many languages

March 20th, 2009

I’ve long been fascinated by book scanning projects, and written before about OpenLibrary and Universal Library, as well as Google Books. However, as neat as these projects are, we shouldn’t forget sites like Project Gutenberg, which have been around for much longer. Project Gutenberg relies on volunteers to scan, OCR and proof-read texts that have fallen into the public domain, with the goal of creating high quality ASCII text versions of the books. This is different from the large scale book-scanning projects mentioned earlier, which rely on institutional investment, and mainly focus on making the scanned images available. Sometimes, you can also get the OCRed text, but it has not been proof-read or quality controlled.

While it’s very neat to see pictures of old book pages, especially ones with elaborate illustrations, ASCII (or increasingly Unicode) text has many advantages. It can be searched and text-mined, copied and pasted into any format, and displayed on any kind of device - whether it’s a text-to-speech device, an ebook reader, a cell phone, an eBook, or what have you. There are also several websites (like manybooks.net) which repurpose these text files, for example by formatting them nicely, converting them to different ebook-reader formats, etc.

I hadn’t visited Project Gutenberg in years, but a friend of mine showed me that they have kept adding books, and in fact radically expanded their international offering. All of these languages have more than 50 titles available: Chinese Dutch English Esperanto Finnish French German Italian Latin Portuguese Spanish Swedish Tagalog, and there are many others that are represented with a few books.

In addition, my friend showed me a few other sites that were in the same vein. lib.ru has a huge collection of literature in Russian, in HTML format. You’ll find your Dostoyevski and Gogol, but also Russian translations of Chinese classics, as well as Soviet-era science fiction! And in Japanese, Aozora has a huge collection of public domain works. And China is amazing at making material available, not caring too much about copyright - there is a plethora of sites where you can easily view the full text of any modern novel in HTML - books.sina.com is an example.

These resources are extremely useful, not least for me who is a learner (and I can’t even speak Japanese, but perhaps one day). It means that I can download a Chinese book, and put it in Wenlin, or use an electronic mouse-over dictionary for the Russian texts. I’d love to hear about interesting book collections in other languages.

And then you have LibriVox - a collective of people who record public domain books, and release the audio books as public domain. I’ve long been impressed by their work, and listened to a number of their books, and my friend pointed out that they’ve now got more foreign content than ever. Their multilingual poetry collections are especially interesting.

Stian

Similar posts that might interest you:
Login