“Why I Deserve an OpenEd 2008 Scholarship.”

August 25th, 2008

Since I kind of “began” my journey in open education through David Wiley’s course, attending COSL’s Open Education Conference 2008 in Utah is almost like coming full circle for me. Since I also think it’s extremely important to get more students involved in this movement, and conversation, I was very excited to see that the Hewlett Foundation had provided for four student scholarships, to be determined by an essay. And now I have just been told that I am one of the winners, which is a real honor. After spending my own money attending the conference in Dalian, and iCommons last year, it feels like a wonderful recognition. I was encouraged by a friend to post my essay online, so here is what I submitted. I am hoping to see a lot of you in Logan, and I am also excited to meet the three others that won the scholarships.

A few days ago, I was visiting the university library of the Banaras Hindu University (BHU) in Varanasi, India, one of the most well known universities in India. Although it is housed in a grand building, I was struck by the extremely bad condition of the books, and the almost complete lack of any recent literature. In the section on libraries, the most recent book was from 1980, and books about library automatization from 1965 were prominently displayed. While seeing this, I tried to imagine studying or conducting research in such an atmosphere.

The various open movements carry the promise of broadening access to learning and research to much larger parts of the world, and to enable the students at BHU not only to receive a top-class education, but also later to conduct cutting-edge research, and to share their research with the rest of the world. Open access to research is crucially important for researchers, but also for students. Open educational resources introduce students to a field, and at higher levels, provide the contextual and pedagogical glue between pieces of research. New models of open and collaborative learning and teaching, such as the “Wiley wikis”, provide models of the process through which open learning can happen.

Although a participant in the North-American and worldwide English-speaking community of educators and thinkers, I come to the field with a number of perspectives. My mother tongue is Norwegian, spoken by only five million people, and this, together with my childhood experiences with Esperanto, incalculated in me an understanding of inequality in linguistic issues, and a deep passion for supporting national and regional languages. My educational background is in development studies, and spending more than 1,5 years in China, 1 year in Indonesia and several months each in Mexico, Russia and India, as well as speaking fluently eight languages has allowed me begin to see things from a multitude of national and cultural perspectives.

Having a background as a technology tinkerer, I am fascinated by the different new platforms offered for online teaching and learning. Having been active in political and social groups, I am extremely curious about ways of online cooperation and equitable governance that we can develop (especially across language barriers). I am also participating in a long-ranging conversation about “the future of the university”, or the peer2peer university - which includes thinking about factors that support informal learning, as well as accreditation, and rethinking the structure of “courses” or “degrees”. I got involved in this conversation at iCommons in 2007, and it resurfaced at the Open Learning conference in Dalian, and will be discussed both at the iCommons 2008 and the Open Education conference in Utah (if our proposal is accepted).

Two of the most interesting research (and advocacy) topics for me are how open educational resources are being used - whether by self-learners, or by educators who adapt them, as well as the issue of translation and cultural adaptation. I bring these two topics together in my MA thesis proposal, where I plan to do a qualitative study of the adaptation and use of MIT OCW materials in Chinese university courses. China is perhaps the country after the USA that has most aggressively both adapted foreign OCW, and began producing their own (Chinese Quality OCW, CQOCW), supported by organizations like CORE. Statistics show that several hundred Chinese courses are currently based partially on translated MIT OCW materials, but we know very little about what part of the material they use, how they adapt it, and what the pedagogical and non-pedagogical outcomes of this is.

Participating in the Dalian Open Learning conference was a wonderful experience for me, meeting many of the people active in the open learning community in person. At the same time, it made me sad to see the great divide between the Chinese researchers (almost 50% of the participants), and the rest. Both because of linguistic and cultural differences, there was sadly far too little exchange and mutual learning going on - and the non-Chinese participants probably lost out the most.

The different open movements are ideally leading to a “flatter” world, which can highlight the best quality material no matter where in the world it is produced. Yet so far, the spread of materials has been very one-sided, with material being produced in the US and translated into other languages. The Indian IIT’s, which some say are on par with or better than MIT, have began publishing lectures on Youtube, and English-speaking students can now choose if they want physics explained by an MIT professor, or one from IIT. But I have yet to hear a single US university or organization announce that they will begin translating Chinese courses (of which there are over a thousand available online currently) into English.

I expect to learn a huge amount from the conference, and come home with my mind swirling with new ideas. I hope to make new connections - people from around the world with whom I can collaborate and learn from long after the conference has ended. Hopefully I will hear about great intiatives that I can emulate through the Open & Free @ University of Toronto network, which I will try to revitalize this fall. But I also hope to be injecting my own perspectives into the debates - perspectives of multilingualism, of developing countries, of end-users, and of undergraduate students - having just graduated from an undergraduate degree myself. And of course, I will spread what I learn - through my blog (syndicated at oerblogs.org), lectures and activities at the University of Toronto School of Education, and in other ways.

Thank you for considering my application.
Stian Haklev

Similar posts that might interest you:

Talk at IIPA in Delhi on open research, OER and open learning in developing countries (slidecast)

August 14th, 2008

I was lucky enough to be invited to give a talk at Indian Institute of Public Administration in New Delhi, a research institute that does consulting jobs for the Indian government and also training of senior civil servants. I spoke to a group of perhaps 25 librarians and professors, trying to give a “whirlwind” tour of the field of open research and open learning, both in general but also in terms of its usefulness for developing countries. It seemed to be well received, and I had several requests afterward for more information. I recorded the lecture on Audacity on my MacBook (quality is mostly good, except for a few times when I turn away from my laptop), and synched it with the slides on SlideShare (synch is mostly good except when the slides change too fast, and SlideShare can’t keep up).

Feel free to have a look and let me know what you think. Below I have also included all the links from the presentation, mainly for the benefit of those who attended.

Thanks again to IIPA and Dr. Munshi for inviting me.

Correction: Hindawi is based in Egypt, not in India. Apologies for this. However, there is still a lot of open access journals being published in India.

Update: Here is a direct link to the MP3 recording (if you press play below, it will play the sound and sync the slides, but if you have a slow connection, or want to listen offline).

Links:

Similar posts that might interest you:

Opensource Fellowships and localization at SARAI day II (plus Delhi Linux User Group meeting)

August 11th, 2008

I already wrote a long post about the first day of the FLOSS Opensource and localization fellowships presentation at SARAI, and here is more from the second day of presentations.

I also wanted to mention that at the end of the first day, we watched a very interesting video from a lecture given in the US in the 1980’s. A scholar of Sanskrit (American) began by talking about the Mahabharata - the epic poem of Hinduism and Indian culture, twice the length of the Bible - and how he was trying to do advanced analysis on it to ascertain which parts had been orally delivered, and which parts had been authored as a written text. For this, it would help having the text available on computer, but trying to type in the whole thing was quite impossible.

Luckily his son was a computer engineer, and working together, they had figured out a way to do OCR - text recognition - of the scanned pages of the text written in the Devanagari script (same as used for Hindi). The computers and machines shown were so primitive, that it is amazing to think of what they were able to achieve. The son was also very good at explaining the thinking process behind designing the OCR program, and although I could have never replicated it, it was very interesting to understand the principles behind it. Also humbling to think that they had designed such a program in the 1980’s, and yet today, we are still struggling to produce good OCR for many Indian scripts (and we think we are working at the “cutting edge” of technology…).

Speech Recognition in Hindi

Sachin Joshi described to us the process of creating a speech recognition system in Hindi. He used Sphinx, an open source system developed at Carnegie Mellon that enables you to construct a voice recognition system in any language. He mentioned that Indian languages were difficult, because they had such a wide spread of dialects and accents. He therefore focused on Northern Hindi, and used a bit over 30 volunteer readers, in different age groups, males and females, to read for about half an hour each. They gathered a large corpus, and then used a computer software to designate the most “efficient” sentences, that cover the most usual sounds in the shortest amount of time.

He then demonstrated to us, using a small script which would record an amount of speech, process it, and then output it in Hindi. When he spoke, using some of the example sentences, it worked fairly well, but when some of the volunteers tried with other sentences, it was not as impressive. However, it is early in development, and it can be improved quite a bit. Already it was impressive, and many of the participants were very excited about the prospects.

What is also exciting is that the project provides the acoustic model - which is the hardest to create - as an API for any other project to build upon. They can take it as a black-box, and add a linguistic model (which contains information about what a person can be expected to say), and build all kinds of applications on top of that, from directory lookup, to train reservation systems, to computer interfaces for illiterate people.

There was also a discussion about how to find good corpora in Hindi that are open source, and that reflect modern spoken Hindi.

Urdu localization

Sawood Alam told us about his project to localize Gnome 2.2 into Urdu. He discussed some of the terms that he had chosen (for me it was amusing hearing several words that were similar to Indonesian, for example terjemah, translate, which is identical in Indonesian). He also discussed several challenges with Urdu language computing and localization, such as input methods and different fonts. After a good amount of time practicing, he reached 40 strings per hour in translation speed. He said it was more difficult to do localization in Urdu, because there had not been a government committee like in the other Indic languages, that had defined a technical terminology. However, the people in the audience were adamant that he indeed was lucky being able to start from scratch, since the terminology resulting from the government committees was in any case too archaic and foreign to users.

Finally some problems with mixing right and left input was discussed, for example when inputting placeholders like %s into the text strings, where the Urdu text goes from right-to-left but the %s goes from left-to-right. Apparently this is also a problem in providing domain names in the Urdu script, as we were told by a person from the Indian Internet Exchange.

Meeting of Delhi Linux User Group

At the end of the meeting, there was a meeting of the Delhi Linux User Group. Gora introduced, talking a bit about upcoming elections, and hoping that they would attract more young, and new faces. Then a guy from OSSCamp.in came on to explain about their concept. They organize “un-conference” style gatherings about twice a year, where anyone can come to give a presentation, and learn from each other. They pride themselves on reducing the distance between professionals and amateurs, and to provide a relaxed forum where people can interact. They often have between 50 and 70 participants, mostly students, even from high schools. They also held a mobile camp (about mobile applications, hardware etc) in Mumbai, which was a huge success.

There were many questions from the audience, and some suggestions. Overall I thought the audience was a bit too critical to their concept, and not appreciative enough of the large amount of diffferent kinds of events for different people that are needed. Indeed, I would have loved to have attended the one that happened at the end of June - had I known about it. This led to a short discussion about the feasibility of having a portal of community events in the FOSS world.

All in all, it was a wonderful gathering. I learnt a huge amount about localization, and the specific challenges that different Indic languages and scripts present, and also about good work being done. I also met some very interesting people that I hope to stay in touch with. I wish the presentations had been advertised better, because I think it might have been interesting to more than the few people who came (mostly people who were presenting themselves). But hopefully through these writeups, at least people will know what happened - and what is going on. If you want more information about any of these projects, or you want to apply for a Sarai FLOSS fellowship (they will be having more), then contact gora at sarai dot net.

Stian

Similar posts that might interest you:

Opensource Fellowships and localization into Indic languages at SARAI

August 11th, 2008

Background

I somehow came across the homepage of Sarai about two years ago, while living in Indonesia. I was extremely impressed with what I saw from their webpages and mailing lists - a vibrant community space/collective that was interested in many of the same things as me; urban issues in developing countries, open source, open culture, national and regional languages etc. They publish a ton of great writings, both in English and Hindi, mostly available from their website, and I downloaded several Sarai Readers and enjoyed reading their thoughts on urban development in Delhi, piracy culture, etc.

I also subscribed to their newsletter, and every month I would get an email telling my about the films they were showing and the talks they were featuring, thinking it was such a pity that all this interesting stuff was going on in Delhi, and I was stuck in Toronto… So when I was finally going to India this summer, for the first time, I knew I wanted to fit in a visit to Delhi, and to Sarai.

On Friday, having spent two days in Delhi being exhausted from avoiding all the touts in Connaught Place, and walking around in the humidity, I suddenly received an invitation from Sarai to a seminar where all their Open Source Fellows would be presenting their projects. Sarai has been using some of their own money, and also some money from Rajiv Gandhi Foundation, to give fellowships for projects involving open source and localization into Indian languages to groups around India. I was extremely lucky to be able to attend this very interesting gathering, and I learnt a lot from the people there. Here are some of the projects that were presented.

Introduction

Gora Mohanty introduced Sarai’s involvement in FLOSS. Originally the organization mainly focused on urban studies, but they needed tools to publish in Indian languages, and originally their involvement was more to “scratch an itch”. They currently have 40 open fellowships, and 10 specific FLOSS fellowships. Based on the experiences from this year, they will try to provide more support to the fellows, and also promote more interaction between the different fellows, in future years. A question related to this is what collaboration technology to be used, since for example IRC is very convenient for some, but scares others away. (At the end of day one, I suggested having all the fellows blog, and then aggregating all the blogs in a planet somewhere). One experience they have gained is that technical people are often not the best people to do localization work.

In general, the projects have had a very high success rate, partly explained by the fact that the people who receive the fellowships are often older and more established than for example the Google Summer of Code participants, and have a track record of delivering.

Lately it has become much easier to get funding for doing FLOSS projects in India, since the concept is becoming quite widely known, although perhaps not so well understood by funding agencies still. There is also more and more commercial activity in for example Hindi localization, and many of the projects Sarai fund are more on the “edges”, involving lesser spoken languages, etc.

An example was given of a workshop in Kashmir University, where 100 students showed up. Gora usually starts his presentations by asking how many have used computers before, and then goes on to ask how many have used Windows, etc. However, in this crowd only two people had ever used computers at all before. Yet they came to the meeting and were eager to get involved!

KDE 4.2 localization

R. Shrivastava has been working on localization of KDE 4.2. He also showed off how the KDE applications work well running natively in Windows. We discussed a bit how to come up with good terminology in Hindi, since there is a balance between simply taking all the English words and spelling them in Devanagari (which is what most cell phone ads in India seem to do), and on the other extreme use obscure terms for downloading or computer that some government agency has come up with, but which are very foreign to the users. He explained that they had tried to strike a balance, using a language that felt natural.

We also discussed what kind of tools could be used, especially to access translation memories from other projects, and other languages. Many of the Northern Indian languages are very similar, such as Hindi, Urdu, Punjabi, Marathi, Bhojpuri and Nepali (although with some exceptions written in different scripts). If one is doing a Hindi translation and is stuck on a term, it might thus be useful to be able to quickly look up how they translated it in Punjabi, etc. I am not sure if KBabel for example has this functionality today - it would be useful for other language groups as well, such as Norwegian, Swedish and Danish, or Finnish and Estonian.

Publishing in Indian languages using TeX

Dr. C. S. Yogananda is a professor of mathematics, and has often helped arrange the math olympics. TeX is a publishing package that separates content from display, and is especially often used in the sciences and math, because it has powerful capabilities to display mathematical formulas, etc.

He described how earlier versions of TeX available for Indian languages required a pre-processor, but that he had developed a version that did not, and was thus much easier to use. He has already developed a version in Kannada, and believes that a one week workshop with participants from different language groups would be enough to produce TeX packages for all the Indian languages using the same framework.

He also discussed localization, and his own belief that mere translation was not enough. He took as example GCompris, a package of games for children, and talked about how localization implied changing some games that Indian kids were not familiar with, updating pictures to reflect Indian realities, changing maps so that they were more relevant, etc.

He also talked about early Indian typewriters, stating that if they had been designed from scratch only inspired by the Western models, instead of taking Western models and keeping the same amount of keys, etc, just exchanging for Indic language letters, people might have been much more comfortable typing in Indic languages. (He gave an example from a Supreme Court judge who told that previous to the typewriter, all court deliberations had been in Kannada, but after the advent of the typewriter, it had been so difficult to type in Kannada, that they had switched to English). Even today there are apparently some issues in the Unicode space for Kannada which also makes Kannada computing difficult.

They have been working on Kannada OCR, which is currently 95% finished. Instead of using an existing framework, they started from scratch. Hopefully this will be finished in another 6 months. Finally he showed examples of a Kannada-English dictionary that had been produced using their system, with thousands of pages, and all the indexing etc, using the advanced functionality in TeX. As far as I understood, this dictionary will later be released openly on the web, after a two year exclusivity agreement with a publishing company has lapsed.

One thing that I found peculiar is that the entire input in the TeX source files (which are later processed and turned into PDFs or other output formats) is written not using a Kannada font, but in latin letters - “transcribed”. He insisted that the system for input was logical, and that they were able to input at high speed using this system, but I thought to myself, what if India had invented the computer, and somebody had forced me to input my latest Norwegian poetry, or novel, using Norwegian transcribed into Devanagari alphabet? This concept is still strange to me.

Oriya lexicon

Dr. N. M. Pattnaik started out with a fascinating history of dictionaries in Oriya. The first dictionary dates back to the 17th Century, and was written for poets. As such, the words were alphabetized based on the last letters, not the first (to improve rhyming), and the meanings of each word were given in a poem. In the 18th Century, missionaries started producing dictionaries and grammars to aid them in their work, but these dictionaries were organized subject wise. In 1916 the first etymological dictionary in Oriya appeared. Then, between 1930 and 1940 a gigantic dictionary of 7 volumes and 10,000 pages was produced. This dictionary contained 185,000 head words, with translations in English, Hindi and Bengali. Unfortunately, only 200 copies were sold, and most of the other copies were destroyed due to rights disputes with the publisher and the heir.

This amazing dictionary, which is of course an incredibly important part of the linguistic and cultural history, not just of Orissa, but all of India, has been scanned and made available through Pattnaik’s organization. They did it using very simple equipment - a digital camera on a wooden stand, and a huge amount of manual editing and post-processing. The resulting 600 MBs have not yet been put on the internet, but I received a copy, and I will post it to archive.org as soon as I am back in Canada in a few weeks (with good broadband). Sneakernet across the world.

This dictionary represented the peak of Oriya dictionary making, and in the small dictionaries published today, one cannot even find modern words like nuclear or electron. There were also glossaries produced by government committees, but these consisted of scientists that never used Oriya in their own work, and were often were unnatural. In addition, the committees were based on subject field, so a given word, used in many subjects, might be translated differently in every committee.

His organization mainly works on making science fun for kids, and believes that this has to be done in their own language. However, scientists are often not very good in local languages (since most of their education and work happens in English), and so they need good lists of scientific terminology in Oriya and English.

Dr. Pattnaik’s organization generated a database of 20,000 Oriya-Oriya popular words, through the help of science writers who have long experience in popularizing technology and science in the Oriya language. They also produced an English-Oriya dictionary which currently has 6,000 words, and they hope it will reach 15,000 words soon. They will also add explanations in Oriya of terms, and reverse the database to generate an Oriya-English database as well. All this is available in StarDict format, which means that it can be easily used in applications for Mac, Linux, and Windows. As well, they have contributed word lists to aspell, to improve spell-checking for Oriya on Linux.

Assamese localization of GNOME

This was presented by Gora, since the fellow A. Phukan could not be present. Phukan works with RedHat, which has been doing a lot of localization work, and lately working on an interface for submitting translations through the web (similar to LaunchPad, but perhaps more open source).

Assamese is closely related to Bengali and Oriya, and is spoken in Assam. C-DAC has already done valuable work localizing software, however they don’t work closely with the community, and thus use too formal words that are unnatural to users, and when they are done, just hand off the results and leaves - whereas software localization is something that has to happen continually in a process.

In addition to localization work, Phukan has also created an online dictionary of Assamese, based on user contributions.

Marathi and Urdu User Guide to Open Office

This was also presented by Gora. Sarvangin Vikas Sansthan use “reversed rewards” to get localization done. They post a number of possible jobs they can do on their website, together with an extremely reasonable price, and they wait for people to fund them. In this project, they translated the 380+ page user guide to OpenOffice into Marathi and Urdu. One thing I noticed from the screenshots was that they were based on non-localized versions of OO.org (ie. with English menus, etc).

The organization also does a lot of training in OpenOffice an other OSS software in schools in Maharashtra.

I will write about day 2 in a separate post.

Stian

Similar posts that might interest you:

Indian reactions to the Beijing Olympics Opening Ceremony

August 11th, 2008

I recently left Varanasi for New Delhi, and I am staying in the crazy and dirty backpacker ghetto called Pahar Ganj, conveniently located behind the main train station. Luckily, my hotel has a TV (with some 80 channels), and thus I was able to catch the live broadcast of the opening ceremony of the Beijing Olympics. I am usually not a huge olympics, or sports fan, but given my long connection to China, and how much I have been hearing about the olympics from my Chinese friends - ever since I was there first in 2001 - I was quite interested in how it would turn out.

In addition, watching it on Indian TV gave me an opportunity to listen to the Indian commentators and see how the olympics, and China, were perceived here. In fact, I kept switching back and forth between three different English language channels (all Indian) - one was CNN IBN, one was Time Now I think, and there was a third on, perhaps NDTV.

Prelude

In the run up to the Olympics, much of the newspaper coverage has been focused on the fact that India’s contingent is quite small, and that India cannot be expected to win many medals. In fact, I think they have only won 16 golds in all the Olympics together. There was also a row about who would get to accompany the athletes, with some having their parents going with them, and others not even getting a space for their coach (reminded me a bit of the negotiations in Chak De).

One of those “can only happen in India” moments was when the weightlifter Monika Devi was expelled for testing positive on a dope test. Manipur is a province tucked away near Cambodia, with a people that looks more East-Asian than South-Asian, and they decided that this had all been some kind of conspiracy from “mainland India” to exclude Manipur from the Olympics. Subsequently a bandh - general strike - was declared for 24 hours in the entire province, effigies were burnt, demonstrations ensued in which at least five people were hurt, etc. In the end Monika was cleared to go, but this was too late to have her enter the competition (I suppose the fact that she was subsequently cleared only deepens the Manipuri belief in a grand conspiracy).

In general, Indians seem far too eager to block railway lines, call for general strikes, and burn effigies. Another example from the last few days was when Kannada groups blocked all the railway lines going to Tamil Nadu. The reason was that Kannada and Telugu were about to be awarded “classical language” status by the Indian government (which would mean more support for language preservation and development). Apparently the Kannada groups believed that Tamil Nadu (which posits itself as home of the original classical Dravidian language) was trying to block this appointment, and thus shut down traffic. But back to the Olympics…

Extremely positive

Generally, the commentators were extremely positive to the opening ceremony. Especially one of the commentators, I think on Time Now, was a talking machine. He seemed to be trying to break some kind of record, as he gushed out superlatives, the mother of all ceremonies, amazing, spectacular, cannot believe my own eyes… All the channels also spent a lot of energy on the Indian team when they entered, re-running for 20 minutes the one minute footage of the ragtag team entering, and Sonia Gandhi waving to them. The focus, however, was quite different. Our exuberant commentator was over himself in appreciation, talking about how excited the Indian’s looked, the patriotism that formerly beamed out of their eyes, and the smile on Sonia Gandhi’s lips that had not been seen for so long.

The other channel, I believe CNN IBN, was however horrified by the fact that while the men had been wearing nice traditional Indian garb, only one of the women wore a sari, while the others wore training jackets and pants. In addition, about half the Indian team had digital cameras and were busy filming the audience while walking out, prompting some comments that they looked like a bunch of tourists (in all fairness, most of the other delegations did the same thing).

This “dress disaster” prompted much discussion, and in the newspaper it was later revealed that the girls had come straight from practice and not had time to change (one wonders if they hadn’t been told about the Opening Ceremony). However, evil tongues suggest that the saris were not delivered in time, or that the athletes were unhappy about the color and quality of the fabric. We shall never know.

Could India do it?

One of the topics raised frequently by all the commentators, who were awed by the organization and infrastructure in Beijing, was “could India host the Olympics?”. Many frankly said no, stating that India couldn’t put on a tenth of this Opening Ceremony, and discussing the infrastructure in Delhi as extremely poor. However, the country will host the Commonwealth Games in 2010, and this was seen as an important stepping stone. Apparently about 50 officials from India were in Beijing trying to learn from the organization of the games.

Stian

Thanks to ..· ✈Katherina ➳·.. for the two first pictures.

Similar posts that might interest you:

Participating in Facilitating Online Communities 08

August 4th, 2008

I came across the new “Wiley wiki” course facilitated by Leigh Blackall called “Facilitating Online Communities” a while back, and blogged about it. At the time, I was considering participating in it, but was worried about both time committments, and lack of internet access. During the next month I will be travelling through India and China, with very intermittent access to internet, and after that I will be beginning my new MA in Toronto, which will undoubtedly take much of my time.

However, I have decided to try. I might not spend as much time as I did on David Wiley’s course last year, where I often spent a whole day each week on doing the readings, and contributing my blog reflections, but I will try to keep up, hopefully learning a lot both from the readings, from Leigh, and from all the participants (who seem to be a wonderfully diverse and interesting group!). The topic is certainly near and dear both to my own interests, and to my future research on open learning, the transformation of higher education, OERs, etc.

I am looking forward to someone creating an OPML of all the participants’ blogs, so that I may easily add them all to my Google reader. The wonderful Google gears now allow me to read blogs offline, and I often take advantage of this when travelling, so if I can just find a wifi cafe once in a while during my travels, I should be able to keep up with participants blogs in the privacy of my guesthouse.

Stian

Similar posts that might interest you:

OpenLibrary and Universal Library, guys work together!

August 3rd, 2008

The OpenLibrary
I wrote about the OpenLibrary previously, and since then they have only gotten better. They have added a lot of books, but more importantly, their website has turned into a real portal, where you can access all of their scanned books (over 234,000). This is only part of their quest, which is also to provide open information about all the books ever published in the world, through an innovative new database/wiki that they have developed. However, it is still very easy to type in a search string, and ask for only scanned books to be returned. They are then viewable in the flip-book format, which is still the absolutely best interface to scanned books that I have ever seen on the web. It makes reading online quite enjoyable (especially for old, beautifully decorated books), and the only thing missing is a zoom feature, which they have stated is coming.

Universal Library
At a conference in Shanghai, I was introduced to the amazing Universal Library project, which I cannot believe I had not heard of earlier. Run by Carnegie Mellon University, in collaboration with Chinese and Indian governments, they have already scanned some 1,5 million books. What is exciting is not just the sheer numbers, but the incredible linguistic variety. Over one million books available in Chinese, and thousands in Urdu, Hindi, Arabic, Telugu, Tamil and many others. They have book scanning centers around the world, and have apparently developed some very advanced new technology, both for book scanning, for OCR in different scripts, etc.

Technical issues
Given all this, I was extremely enthusiastic to try out the project. However, it turns out that all their files are either stores as DjVu (mostly the Chinese contributions) or tiff files (everything else). Both of these require special viewers to be installed, and after spending a lot of time trying to follow different instructions and downloading different files, I was finally able to display the Chinese books in Firefox on my MacBook, but I have still not been able to view the tiff files. And even if I am able to display the Chinese pages, the solution is still very far from as user-friendly and appealing as the OpenLibrary flip-book solution.

It does say on the Universal Library site that they will eventually also make their books available as PDFs etc, but my immediate thought would be, why don’t they publish it through the OpenLibrary? They already have the infrastructure and the technical solution. Indeed I don’t know the different reasons that lead to this not being an integrated part of the OpenContent Alliance in the first place, but if it was possible to integrate all these sources into the OpenLibrary - which is already aiming to be a truly international multilingual solution (even the interface can be translated in many languages) - I think that would be a wonderful solution. (Apparently Universal Library is also part of the Open Content Alliance which runs the Open Library).

The amazement of diving into the collections
Either way, I hope these technical issues are solved, because this is an incredibly revolutionary project. Already a million books are available in Chinese, both out of copyright and ones they have been able to negotiate rights for… I could spend days doing all kinds of weird searches, finding Chinese books written hundred and fifty years ago about Norwegian folk tales, the cultural system in China, or Esperanto. This is also providing me with a renewed motivation to learn classical Chinese (wenyan), because of course most of the books are written in both traditional characters and classical Chinese. And the availability of such a treasure trove of books in Hindi and Urdu, is a huge incentive to my current efforts to learn Hindi!

Vital projects for humanity
I think these efforts are some of the most important going on right now, and they deserve a lot more financial support than they are getting! I would love to see these scanning efforts also expanding to other countries and languages. And especially working hard to try to secure copyrights or permissions for even newer books. One of the only books on the history of libraries in Indonesia, “Perpustakaan Indonesia dari zaman ke zaman” from 1966 is obligatory reading for anyone wanting to do research on the history of libraries and literacy in Indonesia, but it is not available online, and in the US only in a few tattered copies that are sent back and forth between research libraries on inter-library loan (I got mine from a US institution on ILL). There is no way you can convince me that putting that online would constitute anything immoral!

What’s more, many of these collections are incredibly vulnerable and are disappearing - I have myself seen the results of high humidity and low maintenance budgets in Indian university libraries, with old books falling apart. Not to mention archives of millions of manuscripts written on palm leaves, etc. We cannot afford to loose this heritage!

Keep us up to date!

One thing that I find lacking from both projects is a good project blog, that is kept up to date. As a big supporter, I would like to know how their work is going, what their current bottle-neck is, what they are working on. Not in annual reports, but in daily or weekly reports. Most of the “news” on the Universal Library site is not dated, and the statistics of scanned books are a year old. If people knew more about what was going on, they would be in a much better position to offer suport, both direct, and indirect.

Stian

Similar posts that might interest you:

Very exciting new initiative to organize research around open education

August 1st, 2008

I came across an initiative that made me very excited last night. Linked from a number of blog posts, the Open University of the UK, long a pioneer in the field of open education in both meanings of the word, and Carnegie Mellon University, which has some extremely innovative open learning modules, are partnering to propose a very ambitious research project to “develop infrastructure, community and activity to help share research findings on the design and use of OERs”. It aims to “be a global network of OER producers and users as participatory researchers who share designs, methodologies and evidence for evaluating the effectiveness of OERs in increasing and equitably distributing knowledge.

I really liked one of their analogies, that of the cancer research portal: Cancer Research portal. In more established fields such as cancer research (or we might take the Human Genome Project) there is a consensus map of the structure of the field, the major research questions, and the different sub-communities and associated methodologies. It is possible to place oneself on the map, and to coordinate effort in a well understood way. What is the OER research map? What is the OER design process? What does it mean to validate an OER? What are the central challenges that all agree on? The OPLRN seeks to create a structured ‘place’ where questions such as these can be debated, and hopefully, enabling more effective coordination of action around issues and OERs of common interest.”

I can identify with this because I have felt the same lack in the field of development and libraries. I wrote my BA thesis (I will post about it here very soon, when the translation is done) about community libraries in Indonesia, and throughout I struggled with orienting myself in a field that didn’t really exist. What were the typologies of community libraries? Methodologies for investigating them? Theories about their function in different contexts? Lists of case studies already conducted, commonly shared variables, suggestions for criteria for success? I was trying to do a gap-analysis to see where my research could fit in - but indeed I could not even find the entire field of research.

As I go into what will hopefully be a career of creating, advocating for, and researching ways of open learning and teaching, but also more specifically my MA research project, which will probably center on reuse and adaptation of Western educational resources by Chinese universities, such a resource will be invaluable. It will mean that not only will my own research be considerable strengthened, being able to draw upon all the studies that have already been conducted, use shared frames of understanding, and theories that others have found useful, but when it is finished, it will not just be another MA thesis in the library, but it will (ideally) make a small but significant contribution to a dynamic body of scholarship.

The ideal is to “build an epistemic community — a reflective community of practice dedicated specifically to advancing understanding not only of its field, but of what can/should count as “knowledge” in the field. Educational technology, and OERs specifically, are young, interdisciplinary design fields lacking widely adopted design methods, patterns, or evaluation criteria. The infrastructure must therefore foster appropriate forms of discourse and memory: structures for sharing, indexing, recovering and debating the community’s collective intelligence on the relative merits of different OER design and evaluation approaches. Nor can the infrastructure fossilize as soon as the project’s startup funding ends: it must be a sustainable social and conceptual network that can evolve through the contributions of many people.

In addition to the research projects, they also plan to develop several tools to further the actual processes of academic discourse online.

An epistemic community is interested in claims and supporting evidence, but also in counter-claims and differing interpretations of the same evidence. While many projects are engaged in building collective intelligence, few know how to deal well with contested knowledge — other than by enabling comments, threaded fora, blogs and wikis.

[…] The software deliverable from this project will enable learners, educators, researchers, analysts and other decision makers to ask questions such as, What evidence is there that this OER is effective?, but equally important, Does the community agree with the claims made? What counter-examples are there? In addition, as a socio-technical infrastructure, it will be architected to support diverse forms of evidence, with an architecture of participation that sets a low threshold for contribution, and community structures to help those who want, to move from the periphery to become more active, central players.”

I am also proud to see someone from OISE prominently involved; professor Marlene Scardamalia.

Looking forward to following their progress, wish the project the best of luck, and hope to perhaps get involved at some stage.

Stian

Similar posts that might interest you:

Size of Wikipedia in different Indic languages

July 26th, 2008

Preparing for a talk I am hoping to give in Delhi, I decided to look up the sizes of the Wikipedias in the different Indic languages. There are several ways of assessing the size of a language Wikipedia, none perfect. The most common one is to look at the number of articles, and although this gives a benchmark figure, it can be very misleading. This is especially because of robots that run around creating thousands of new almost empty articles, for example of cities in the US with only census data. There is nothing wrong with this per se, but it can completely distort statistics, and it’s also damaging when people single-mindedly focus on reaching some goal in terms of number of articles, without focusing on making them longer, and of higher quality.

Because of my long (although not very successful) involvement with creating solutions for viewing Wikipedia offline, I have spent a lot of time looking at the packages that Wikipedia compiles for download. For the first time in over a year, they successfully completed a full dump of all languages, available here. These packages have one HTML file representing each article (as well as user pages, discussions, images etc), in an extremely (typically at least 10x) compressed form. Looking at this size in itself is unlikely to tell us anything useful (except for whether it can fit on for example a DVD), but comparing it either to previous dumps, or to other language Wikipedias, can give a good measure of relative size.

One problem is that they have made a slight change in the packaging procedure - whereas earlier all the files where simply packaged with 7Zip, currently they are first tarred and then the tar file is packaged with 7Zip. I have not done any experiments to see if this has any impact on package size - the reason for the change was to speed up making the dumps, but because of this, historical comparisons are not easily conducted.

However, for an upcoming (possible) talk in Delhi, I decided to look at the relative size of different Indic language Wikipedias. I remember being shocked, when starting to study Hindi, that the page about Norway in the Hindi Wikipedia was only two sentences long! (Sadly, about a year later, this is still the case. Hopefully my Hindi will soon be good enough to be able to add content). Certainly a lot of Indians, both in India and abroad, contribute to Wikipedia, but they choose to contribute to the English language version.

So below are all the languages spoken in India (I might have missed some, but I don’t think so), and the size in megabytes, of their compressed dump files. I also checked how their page about Norway looked in each language (facilitated by my redirect script). Certainly the absence or presence of a good article about Norway is not a great measure of a good encyclopaedia, but most up-and-coming language versions try to get certain basic articles in place as early as possible, and these certainly include articles about all the countries in the world.

Kannada and Urdu was the only two that contained decent sized articles about Norway. Kannada is near the top in size, so this is not surprising, but Urdu is interesting. Can it be connected to the fact that Pakistanis are the biggest group of non-Western immigrants in Norway? This has not affected Norway’s treatment in the Panjabi version (most Pakistanis in Norway are from the Punjab region), perhaps because the Panjabi version is written in Gurmukhi, a script similar to Devanagari and used in India, and not in Shahmukhi, similar to the Urdu script?

That Telugu is the largest is not so surprising, given that a library researcher at the Banaras Hindu University told me that Telugu currently produces the most literature in India. However, their wiki had no page on Norway at all (most of the others have at least a place-holder page), and even when searching for broad topics like “Philosophy”, I found nothing. Perhaps they have written very good articles about a subset of topics?

It’s also curious that Bishnupriya Manipura, a language spoken in Manipur that I had not even heard of before, comes so high up - higher than Hindi. Perhaps there is some technical reason, or perhaps they are just a small community that are actively contributing articles. Certainly there are not many millions of Esperanto speakers in the world, but because of a small and very dedicated community, they manage to produce an encyclopaedia of an impressive size.

Here is the list, with a few other languages included for comparison.

Size in MB of compressed files of language Wikipedias, as of June 2008

English 14,000
German 3,200
Chinese 626
Norwegian 361

Esperanto 185
Telugu 52
Kannada 46
Bangla 34
Tamil 34
Bishnupriya Manipura 30
Hindi 30
Malayalam 29
Marathi 23
Nepali 21
Urdu 16
Kannada 10
Sanskrit 3.2
Gujarati 3.1
Sindhi 2.6
Bhojpuri 2
Tibetan 1.9
Panjabi 1.7
Kashmiri 1.3
Oriya 1.3

Hopefully, if I repeat this exercise in a year, I will be able to see a lot of progress… I have plans to promote editing Wikipedia in your own language for students at University of Toronto - who are from all over the world, and have access to very good infrastructure. And promoting the idea of professors including Wikipedia in their assignments. The Spanish literature project is an example of articles one can only (so far) dream of seeing in Hindi and other Indic languages.

Stian

Similar posts that might interest you:

Is your course schedule full yet? Some great offerings (Wiley wikis)

July 20th, 2008

When I took Wiley’s course Intro to Open Education, it was something quite new for me, and I learnt as much from the design (both the things done well and the things that could be improved) as I learnt from the contents. Turns out his course has inspired a following, who have kept learning both from his example, and from each other, experimenting with different ways of conducting classes online, often blending it with for-fee credit students.

I first came across Teemu’s course on Composing free and open educational resources. This was conducted on Wikiversity, which was also interesting to me, because I had been considering Wikipedia a repository of course materials, rather than a site to conduct teaching and learning (which I would think would require different functonality, etc). I was excited to see this, and even wrote to Teemu asking that the lessons from conducting such a course be written up.

Otago Polytechnic has conducted the two courses Designing for flexible learning practice and Evaluation of e-learning for best practices on WikiEducator, and in about a week the new course Facilitating online communities run by Leigh Blackall will begin. This looks very interesting, and I am considering joining, even though it would be very difficult since I am spending the next six weeks in India and China. I might follow the course and the discussions, without being an active participant.

Finally, Stephen Downes and George Siemens are offering what is going to be a “megacourse” on Connectivism and Connective Knowledge. Although there is no detailed course plan available, they have attracted a lot of attention, and over 1000 people have already signed up to be informed when the course starts (of course, not all of those will participate actively). This means that the course will function quite differently from a smaller course with 30-50 participants where everyone - even though it takes a lot of time - can still read everyone else’s blog posts.

What is exciting is that there seems to be growing up a whole ecosystem of services and projects based around this course. Not only is it being translated into Chinese, Spanish and Portuguese (we’ll have to see how this will happen once the course actually starts, but it’s tremendously interesting to see someone even trying to facilitate such a course for multilingual audiences). They have a Google group that is active with discussions, and there are proposals for local learning communities, interactive chats, etc. I will be following this course very closely once it starts, perhaps participating actively, to see what new ways of interacting that develop.

In addition to the sheer amount of new courses that are explicitly orienting themselves in a “Wiley” tradition, and learning from each other, to the extent that Leigh Blackall coined the term “Wiley wikis”, there is also an emerging conversation taking place about what we can learn from these different attempts, how we should structure future courses - in terms of instructor load, participation of students, managing paying and non-paying students, trying to provide credit to external students through different institutional arrangements, etc.

Leigh Blackall, Teemu Leinonen and Bronwyn Hegarty conducted a conversation about this topic, which is available for download. Leigh Blackall and Sarah Stewart also discussed preparing a paper for a conference on their experience, and I love how they have documented all their discussions online - including three blog entries (1, 2 and 3), an audio recording of a meeting, and the script of the presentation.

I am very encouraged by this willingness, not only to experiment, but also to reflect, connect with others, discuss, and share the discussion and open it to everyone. It’s a discussion I am certainly hoping to participate in, both as a student, teacher, researcher and advocate.

Stian

Similar posts that might interest you: