Language representation among DOAJ Open Access journals

I am writing a paper about encouraging undergraduate students to conduct research in their mother tongues / other non-English languages that they know. One key element is of course availability — University of Toronto students are lucky to have access to a large repository of foreign online journals, but this might not be the case everywhere, so Open Access can play an important role. Curious about this, and frustrated about the difficulty in finding good numbers about the size of the “academosphere” in different languages (how many journals are published in Spanish? In Farsi?), I decided to have a look at the Directory of Open Access Journals data.

Unfortunately, they don’t provide an option to search based on language, but luckily they allow you to download their entire database of journal metadata as a comma-separated file. They have one field for journal language, but often there are several languages listed, so simply sorting them and counting in OpenOffice.org is not good enough. I whipped up a quick Ruby script, reusing a few lines from my previous script to count the most frequent search-words used with my online Chinese-English dictionary, and got the following list:

English 3309
Spanish 871
Portuguese 472
French 338
German 202
Italian 114
Turkish 60
Croatian 46
Russian 45
Catalan 45
Japanese 26
Polish 24
Chinese 19
Romanian 14
Norwegian 13
Swedish 13
Czech 13
Serbian 11
Persian 10

In total there were 4010 journals listed, but note that journals that have articles in several languages are double-counted, so a journal with articles in French and English would be counted as one journal for French, and one for English. (There are a total of 74 languages represented, the full list is here).

I am assuming (hoping) that there are more than 19 open access journals in Chinese, for example, but on the other hand, there might not be a strong incentive to be listed on an English-language only website, which does not even allow for searching/sorting by language. The data would have been better if we could have looked at the distribution at an article level, because some of the journals which list several languages are overwhelmingly published in only one of them — however, only a portion of the journals have article data, and the article metadata does not contain a field for language (unlike journal metadata). I wonder if it would be feasible to run the titles through a language recognition library, but that has to be wait for another rainy day.

Stian

Similar posts that might interest you:

3 Responses to “Language representation among DOAJ Open Access journals”

  1. Ask PLoS Medicine: What can medical journals do to address language barriers? « Speaking of Medicine
    July 31st, 2009 @ 4:19 am

    [...] by the range of languages listed in the Directory of Open Access Journals – according to a blog by Stian, on April 5th 2009 there were 871 journals in Spanish, 472 in Portuguese, 388 in French, 202 in [...]

  2. izletv
    October 15th, 2009 @ 12:37 pm

    dssd

  3. yedigun
    November 30th, 2009 @ 7:35 pm

    The following procedure may totally wreck iTunes. Make a backup copy of the application before proceeding. I make no guarantees as to whether this will work and whether or not it will wreck up iTunes, although it does work fine for me.

Leave a Reply

Login