Fun with terminal and Vim macros

April 20, 2015, [MD]

I always enjoy finding new ways of automating boring tasks, and today I had a few different small tasks that I was able to solve using some terminal-fu, and Vim macros. I really enjoyed how well it came together, so I thought I'd put together a short screencast to demonstrate.

I had generated a number of discussion forum graphs for different courses. The processed data is in a json file in a directory named after the course. My task was to put an identical index.html file into each directory, create an index of the courses, with links, and modify all the landing pages to reflect the names of the courses. Not a hugely complicated task, but something you could run into day to day, and which would be quite cumbersome to do manually.

The video should be quite self-explanatory, but I've included some "show-notes" below.

As far as I know, ls doesn't have a built-in way to list only directories, so I found this snippet somewhere and added it to my .zshrc:

alias ld="ls -lht | grep '^d'"

awk is a very powerful string manipulation tool, but in this case we're just using it to get the 9th column of each line (awk '{ print $9 }')

OpenEdX and LTI: Pedagogical scripts and SSO

February 21, 2015, [MD]

I came up with a neat way to embed external web tools into EdX courses using an LTI interstitial, which also allows for rich pedagogical scripting and group formation. Illustrated with a brief screencast.

Last year, I experimented with implementing pedagogical scripts spanning different collaborative Web 2.0 tools, like Etherpad and Confluence wiki, in two hybrid university courses.

Coordinating collaboration and group formation in a class of 85 students was already challenging, but currently we are planning a MOOC built on the same principles. Before we begin to design the pedagogical scripts and the flow of the course, a key question was how much we could possible implement technically given the MOOC platform. Happily, it turns out that it will be easier and more seamless than I had feared.

We chose to go with EdX (University of Toronto has agreements with both EdX and Coursera), which is based on an open-source platform. Open source can mean many different things, there are platforms that are mainly developed privately in a company, and released to the public as periodic "code dumps", and where installation is very complex, and the product is unlikely to be used by anyone else. I didn't know much about the EdX code, but the fact that the platform is already used by a very impressive number of other installations, such as China's XuetangX and France's Université Numérique was already a very positive sign.

Random German Stuff that Matters - great content auf Deutsch

January 24, 2015, [MD]

Why introduce German content in English?

One issue that I have not heard much discussion about, is the challenge of finding interesting material in foreign languages. In your own culture and language, you have gotten to know authors through school, through popular media, recommendations from friends, etc. But when you begin exploring a foreign language, it's often like starting from scratch again. I remember walking around in the libraries in Italy, having no idea where to start, or what kinds of authors I might enjoy.

This is made worse by the fact that I am typically much slower at reading in a foreign language, and particularly in "skimming". This means that even though I can really enjoy sitting down with a nice novel, and slowly working through it, actually keeping up with a bunch of blogs, skimming the newspaper every day, or even doing a Google-search and flitting from one page to the next, becomes much more difficult.

Some things are easier these days - it's typically easier to get access to media over the Internet, whether it be movies, podcasts, or ebooks (although this differs a lot between different languages, it is still difficult finding Indonesian movies or TV-shows online, and even Norwegian e-books are hard to come by online), but navigating is still difficult. When I began learning Russian again, I was amazed at the number of very high-quality English blogs that talk about Russian modern literature (places like Lizok's bookshelf, XIX век and Languagehat), which really inspired me to continue working on my Russian, to be able to read the books they so enthusiastically discussed.

Starting data analysis/wrangling with R: Things I wish I'd been told

October 15, 2014, [MD]

R is a very powerful open source environment for data analysis, statistics and graphing, with thousands of packages available. After my previous blog post about likert-scales and metadata in R, a few of my colleagues mentioned that they were learning R through a Coursera course on data analysis. I have been working quite intensively with R for the last half year, and thought I'd try to document and share a few tricks, and things I wish I'd have known when I started out.

I don't pretend to be a statistics whiz – I don't have a strong background in math, and much of my training in statistics was of the social science "click here, then here in SPSS" kind, using flowcharts to divine which tests to run, given the kinds of variables you wanted to compare. I'm eager to learn more, but the fact is that running complex statistical functions in R is typically quite easy. The difficult part is acquiring data, cleaning it up, combining different data sources, and preparing it for analysis (they say 90% of a data scientist's job is data wrangling). Of course, knowing which tests to run, and how to analyze the results is also a challenge, but that is general statistical knowledge that applies to all statistics packages.

A pedagogical script for idea convergence through tagging Etherpad content

October 4, 2014, [MD]

I describe a script aimed at supporting student idea convergence through tagging Etherpad content, and discuss how it went when I implemented it in a class


In an earlier blog post I introduced the idea of pedagogical scripting, as well as implementing scripts in computer code. I discussed my desire to make ideas more "moveable", and support deeper work on ideas, and talked about the idea of using tags to support this. Finally, I introduced the tool tag-extract, which I developed to work on a literature review.


I am currently teaching a course on Knowledge and Communication for Development (earlier open source syllabi) at the University of Toronto at Scarborough. The course applies theoretical constructs from development studies to understanding the role of technology, the internet, and knowledge in international development processes, and as implemented in specific development projects.

I began the course with two fairly tech-centric classes, because I believe that having an intuition about how the Internet works, is important for subsequent discussions. I've also realized in previous years that even the "digital generation" often has very little understanding of what happens when you for example send a Facebook message from one computer to another.

Supporting idea convergence through pedagogical scripts and Etherpad APIs, an introduction

October 4, 2014, [MD]

We can script Etherpad to push discussion prompts out to many small groups, and then pull the information back. Using tags, we can extract information, and as a community organize the emerging folksonomy

This blog post brings together two long-standing interests of mine. The first is how to script web tools to support small-group collaborative learning, and the other is how to support reorganization of ideas across groups.

Two meanings of the word "scripting"

There is an interesting intersection between two quite different meanings of the word scripting in the context of my work. In the CSCL literature, scripts refer to sequences of activities that support groups of students in carrying out a collaborative learning activity. They can be very simple and generic, like the well-known jigsaw script, or very content-specific. There is active on-going research on scripting, including how external scripts interfer with internal scripts, and the dangers of over-scripting.

From the technology side, we talk about writing scripts as a light-weight form of programming, to ask the computer to do something for us. Scripting a program, or a web site, means that we can interact with it automatically, for example asking Google Docs to set up 20 documents for us. There is obviously a connection between developing educational scripts and actually "implementing" them in technology. Often, specialized software has been written to enable specific scripts, but there has been some attempts at enabling more generic implementations of learning designs into software.

Easy interoperability between Ruby and Python scripts with JSON

September 23, 2014, [MD]

I recently needed to call a Ruby script from Python to do some data processing. I was generating some Etherpad-scripts in Python, and needed to restructure tags (using tag-extract) with a Ruby script. The complication was that this script does not just return a simple string or number, but a somewhat complex data structure, that I needed to process further in the Python script.

Luckily, searching online I came across the idea to use JSON as the interchange format, which worked swimmingly. Given that all the information was in text format, and the data structure was not that complex (just some lists and dictionaries), JSON could cope well with the complexity, and was easier to debug, since it's a text format. If I had had other requirements, like binary data, I would have had to investigate other data formats.

The Ruby script already had several different output options, controlled through command-line switches. I added the option to output through JSON, and the code required to do the actual output was simply:

puts JSON.generate(tags)

I also reconfigured the script to accept input from standard-in:

if ARGV.size == 0
  a =
  a = try {[1]) }
  unless a
    puts "Could not read input file"

This meant that I could simply pipe the information from the Python script through the Ruby script, writing to standard-out and reading from standard-in, not having to create any temporary files at all.

Creating Anki cards for Russian Coursera MOOC with stemming and frequency lists

April 17, 2014, [MD]

In which I automatically generate Anki review cards for vocabulary based on subtitles from a Russian Coursera MOOC

Learning Russian, a 10-year project

I have been working on my Russian on and off for many years, I'm at the level where I don't feel the need for textbooks, but my understanding is not quite good enough for authentic media yet. I've experimented with readings novels in parallel (English and Russian side by side, or Swedish and Russian, like on the picture), and I've been listening to a great podcast for Russian learners.

Language learning with MOOCs

MOOCs can be a great resource for language learning, whether intentionally or not. At the University of Toronto, we found that more than 60% of learners across all MOOCs spoke English as a second language, no doubt some of them are not just viewing the foreign language of the MOOC as a barrier, but are also hoping to improve their English. To my delight, the availability of non-English language MOOCs has been growing steadily. For example, Coursera has courses in Chinese, French, Spanish, Russian, Turkish, German, Hebrew and Arabic. There are also MOOC providers focused on specific linguistic areas, for example China's XueTangX, France's Université Numérique, and Germany's iversity.

Learning idiomatic Haskell with

March 11, 2014, [MD]

I got my introduction to functional programming through Clojure, but lately I've been really fascinated by Haskell. I gave up on it a few times, because it can seem quite impenetrable, and very different from anything that I'm used to. But there is something about the elegance, and powerful ideas that keeps me coming back. I've read a bunch of tutorials and papers, followed blog posts, and experimented a bit with ghci and IHaskell, but I never really got off the starting block with writing my own Haskell.

How to get practice?

The projects I'm currently working on in Python are too complex and urgent for me to try to implement them in Haskell at this point, and because Haskell is so different, I found myself stymied doing even simple things, even though I'd just finished reading a complex CS paper about monads. I needed some structured tasks that were not too hard, and that gave me good feedback on how to improve.

Listening to the Functional Geekery podcast, I heard about, which provides exercises for many popular languages. There are several similar websites in existence, perhaps the most well known is Project Euler, however that focuses too much on CS/math-type problems, and is perhaps better for learning algorithms than a particular language. The way works, is that you install it as a command line application. The first time you run it, it downloads the first exercise for a number of languages.

Parsing massive clicklogs, an approach to parallel Python

March 10, 2014, [MD]

I am currently working on analyzing MOOC clicklog data for a research project. The clicklogs themselves are huge text files (from 3GB to 20GB in size), where each line represents one "event" (a mouse click, pausing a video, submitting a quiz, etc). These events are represented as a (sometimes nested) JSON structure, which doesn't really have a clear schema.


Our goal was to run frequent-sequence analyses over these clicklogs, but to do that, we needed to process them in two rounds. Initially, we walk throug the log-file, and convert each line (JSON blob) into a row in a Pandas table, which we store in a HDF5 file using pytables (see also). We convert the JSON key-values to columns, extract information from the URL (for example /view/quiz?quiz_id=3 results in the column action receiving the value /view/quiz, and the column quiz_id, the value 3. We also do a bit of cleaning up of values, throw out some of the columns that are not useful, etc.

Speeding it up

We use Python for the processing, and even with a JSON parser written in C, this process is really slow. An obvious way of speeding it up would be to parallelize the code, taking advantage of the eight-cores on the server, rather than only maxing out a single core. I did not have much experience with parallel programming in general, or in Python, so this was a learning experience for me. I by no means consider myself an expert, and I might have missed some obvious things, but I still thought what we came up with might be useful to others.