Tweets from Coursera Partner's Conference

April 6, 2013, [MD]

As many times before (most recently, Beyond the PDF 2 in Amsterdam), I've archived my tweets from the recent Coursera conference, cleaning them up just a little bit (I took out most retweets, but included a few). See also my impressions from the conference.

Before the conference

  • Will be in Philadelphia for #CourseraConfAtPenn Fri+Sat. Anybody wants to meet to talk about #oa, #btpdf2, learning, #OER?
  • The Coursera Partners' Conference gets underway tomorrow, April 5th, 2013. We're stoked! #CourseraConfAtPenn
  • If you’re hashtag is 18 characters, you’re doing it wrong. Looking at you, #CourseraConfAtPenn.
  • @Akibaedx Great to see that edX and Coursera are hanging out :) Very interested in research on flipped (PhD stud from UofT)

First day

  • Great name tags, able to see name and affiliation without leaning in. Makes connecting online names to faces much easier
  • @derekbruff @rschwartz418 It does, but it's the easiest thing to work with. And students can mix and match.

Silfen forum

  • going to overflow room to see panel on MOOCs. Very meta. Will there be computer grading of questions?
  • Interested in talking to anyone researching MOOCs/flipped classrooms from persp. of education/learning sciences/CSCL
  • @icpetrie MOOCS are a) revolutionising education, b) more of the old, c) just the top of the iceberg, d) hype :)
  • Can MOOCs support non-high performing learners? Old q, OCWs, Hewlett Foundn, P2PU etc have struggled w/ a long time
  • MOOCs are at the Altavista stage, Google hasn't even come out. Friedman good at catchy phrases
  • Friedman: MOOCs instead of F16s in Egypt
  • @dankonecky I would love to see MOOCs/OER from institutions in Egypt. Course on history of Arab region, by a local uni
  • @hong_chau Make it happen, let's adapt some self-organisation from the original cMOOCs! :)
  • Given that we are so eager to reach low-performing students, will Coursera partner with any community colleges?
  • #CourseraConfAtPenn hash tag length is killing me. Er, I mean challenging me to be concise.
  • Watching Coursera lectures at 2x speed has spoilt me. Not about concentration length, but cognitive bandwidth #tooslow
  • Very US-centric MOOC discussion, weird given the very international audience at #courseraconfatpenn
  • @veletsianos UofT has partnered with both (but no EdX courses live yet). My own research more on flipped. But also doing institutional rsrch
  • All the people on the panel talking about the wonder of four-year degrees need to read this
  • Nostalgy for brick-and-mortar: Beyond row three, it's all distance education anyway
  • Would love to hear EU perspective on MOOCs - how does it look different in the European context? /c @gideonshimshon
  • @gideonshimshon Nice, wish they had released open data sets, not just PDFs :)
  • Discussion abt rest of world similar to OCW-be generous and share. But awesome stuff happening in the rest of the world
  • @cvhorii Global perspective 1-way cultural flow? Can glob courses help our international students
  • @icpetrie When will Harvard step u and translate a Chinese course into English? Why only one-way?
  • @dankonecky Are there learning scientists working seriously on this, or is it all CS/ML people? #wanttobepartofit
  • +100 “@rschwartz418: Shouldn't we all have watched lectures, panels, & keynotes before #CourseraConfAtPenn and spent time interactively?”
  • @rschwartz418 We need flipped conferences, not just flipped classrooms!
  • @derekbruff @rschwartz418 Agree that platforms don't support. Talked abt this at #p2pu, supporting learning journeys
  • @D_Carchidi @cristofolo @bederson I think faculty roles will change significantly, for the better.
  • @amywoodgate @rschwartz418 Maybe next conf should be more like a cMOOC, less like an xMOOC? :)
  • @bederson Great, very useful link, thanks a lot. Backchannel more rewarding than panel right now
  • The big loser of the conference so far: AltaVista. #CourseraConfAtPenn
  • @taevans Yeah compare cMOOC/xMOOC, part of open internet or closed garden? Course ends vs community continues
  • @bederson My problem is that there is too little research really problematising the f2f teaching/learning in h.ed.
  • And thanks to all the Twitter-panelists as well! ;)

Keynotes

  • Getting ready for #Courseraconfatpenn keynote. Good wifi and power - let's see how much twitter activity
  • @derekbruff Interesting. Seems like all the focus (not only there) is on purely online though.
  • @derekbruff A lot of prescriptive stuff, I need theories to fuel my research.
  • Coursera, has it really only been one year? Amazing.
  • 3.1 mlm students, 333 courses, 2330+ instructors Koller,
  • Koller proud: 30 of top 60 universities worldwide, #1/#2 in 14 countries
  • Largest course: Think again, how to reason and argue 180k.
  • Made $222k on signature track in 11 weeks
  • Money will be shared back w/unis, Koller with cheques. 1000 applications accepted for financial aid.
  • Students signing up for signature track to signal commitment to finishing course, higher completion rates
  • 40% of Coursera users in dev'l world (3% in Africa), Khan acad 80% in US (surprising)
  • Looking forward to first Coursera course in Hindi, or ki-Swahili!
  • @amywoodgate Sustainable if people pay I guess, and can easily be outsourced to low-cost countries.
  • Retention funnel: enrolled 31k-56k, submitted assignment 900-6.5k, earned SoA 1-3k. Similar numbers at UofT.
  • Interesting that Coursera is doing A/B testing - students want to see people's faces
  • @amywoodgate I don't think paying people in low-cost countries lower (but fair) salaries is exploitation, but agree with you on ideal
  • New tools for smaller groups coming,
  • Mobile app coming over the next year
  • App platform sounds interesting, physics simulator, note taking tool, etc
  • @EvansCowley Maybe it should be toggleable :)
  • @EvansCowley You'd think but people tend to really enjoy lectures. I think short demos, 2x speed, pause able, with screen can work well
  • Wonder if there's a mismatch between attendees at \nand keynotes… Talking about system issues, but attendees are more instructional designer/online learning people who are carrying out mandates from provosts centrally?
  • Looking forward to first panel on participation in MOOCs, getting concrete
  • xMOOCs? Never lonely in a cMOOC-just lost.RT @hong_chau: Werbach: #MOOCs r lonely. Students desperately want community. #CourseraConfAtPenn
  • @hong_chau One course I took, people posted hundreds of intros first week. Afterwards, almost no activity...
  • Discussion about "personality cult" around lecturers, and how attitudes of first-gen "pioneers" transferred upon 2d gen
  • /We/ may want community but Edinburgh data suggested only 10% actually wanted to be part of a community #CourseraConfAtPenn
  • +100 “@hong_chau: Advice for @coursera and the next #CourseraConfAtPenn... Snacks during break?”

Afternoon panels

  • Excited abt next panel on flipped teaching. Doing rsrch on that myself.
  • @taevans Would also be useful with wiki to collect notes/artefacts/papers etc. MOOC-style! :)
  • Great panel on flipped classrooms, all notes (incl questions) … Let's continue the conversation!
  • @preset Cool, awesome notes. Added here
  • Created open Google Doc to track notes and other artefacts from #courseraconfatpenn please add!
  • P2PU's "Mechanical MOOC" automatically assigns students to groups of 10, and creates group mailing lists
  • Blog of Pamela Fox who is talking right now

Day 2

  • Good morning #courseraconfatpenn, great breakfast but a bit claustrophobic poster session :)
  • Fostering inter school collaboration - retitled: "News from the front lines"
  • Dutch-speakers dominante in the room. Goeden Morgen! ik spreek een beetje Nederlands :)
  • Dutch researchers are #2 in the world in individual productivity, after the Swiss.
  • Interesting idea from Leiden: Use MOOCs for student mobility (important in European concept)
  • Call for empirical collaborative research on MOOCs, great - hope this happens
  • RT Next time, need focus groups for topic areas, for instructors to meet and share ideas.
  • @coxvaxie Wonder how many people here are faculty, as opposed to instructional designers etc. How to get invite more
  • MIT faculty was told by administration: Not allowed to give guest-lecture in Coursera, we're on #EdX only.
  • Distinguishing between cooperation and collaboration - collab implies interdependence
  • @Afilreis "The quizzes are silly" @DaphneKoller shoots back "They are not" everyone laughs
  • Shoutout to #lak13 learning analytics conference starting in 2 days in Leuven
  • Dillenbourg suggests universities organising exams for each other's MOOCs, undercutting Pearson
  • Duke has one FT staff member assigned to a course while it's running...40hrs/wk dedicated to that course. Dillenbourg: plagiarism and cheating, we still need proctored exams to give credit
  • Panel on learning from data looks popular, room filling up fast. Looking forward to Dillenbourg and others
  • Dillenbourg: We're academics, we love data, put data in our coffee, brush our teeth with data
  • China not well represented. Language issues, or internet access? Subtitled OCW courses massively popular there.
  • Dillenbourg is doing amazing research on MOOCs, two open postdocs and 2 PhDs.
  • Notes from session on data analytics and MOOCs #lak13
  • How I got all the photos integrated with my wiki notes from sessions
  • Would love to see foreign-language MOOCs used in language classes in N-Am #ich-will-lernen #apprendiamo
  • Thank Daphne&Andrew for confusing us so much over past yr - in the most positive way. Yes, making our lives interesting

See also my notes on flipped classrooms, and learning analytics.


Impressions from the first Coursera conference

April 6, 2013, [MD]

This weekend, I attended the inaugural Coursera Partner's Conference at UPenn in Philadelphia. I attended both as a PhD student with an interest in MOOCs, open learning, and flipped classrooms, and as an institutional researcher for Open.UToronto, supporting internal evaluation of the UofT Coursera MOOCs.

Overlap with existing groups

I was delighted to see a bit of overlap with two other groups that I am part of. Gary Matkin and Larry Cooperman from UCI, José Escamilla from Tecnológico de Monterrey and Sukon Kanchanaraksa from Johns Hopkins School of Public Health are all old friends from the Open CourseWare Consortium, and it was great to catch up with them. Pierre Dillenbourg from EPFL is a well-known researcher in computer-supported collaborative learning, and I am really happy to see others from my academic field getting involved with research on MOOCs.

Otherwise, institutional support personell (from provosts and deans to instructional designers and offices of teaching and learning) seemed to be heavily represented, with only some professors who have actually taught Coursera courses present. There were also a few institutions represented who are still considering joining Coursera.

Sessions

The sessions ranged from the initial sold-out Silfen forum with people like Thomas Friedman and Martha Kanter, to much more specific panels on flipped classrooms, learning analytics, inter-school collaboration, etc. (Full program)

I took extensive notes from two sessions, one on flipped classrooms and one on learning from data, both were really interesting and I will probably be in touch with some of the people on both panels to follow up. (It was also a great opportunity to use my iPhone/Researchr integration).

It was also fun to meet the very young, smart and energetic Coursera team. It's truly impressive what they have managed to create in a year (launched April 2012), and I'm very excited to see what they, and others (EdX, Udacity, etc) will come up with. Hopefully in the future, we will have a conference where people can meet across "divisions", from all these different platforms. There are initiatives at educational research meetings, such as MOOCshop, but normal faculty and instructional designers etc are not likely to attend those.


Switching from WordPress to nanoc, jumping in feet first

April 2, 2013, [MD]

I've thought about switching to a static site generator for a while, spending some time playing with Jekyll, the poster-boy for SSGs, and learning about the different options, but I always put it on the backburner.

There are many reasons for switching, and for doing it now. My current blog layout has been the same since 2005, and although I've enjoyed it, it's starting to feel old. I've also spent a lot of time writing on my wiki, which runs on localhost and is synced online, with a number of neat tools to help me author more effectively, and these days I find it so much quicker to write something on my wiki, than on my blog.

I don't know if I've really been writing less on my blog lately, or if it just feels like it - it's always been up and down. In the graph below, each point represents a blog post, the y-axis shows length (log-scale), and the x-axis time. You can see some pretty big gaps where I didn't write anything (like in 2007). I also seem to write fewer very short blog posts, which is probably because I began using Twitter and later Google Plus for quick links, and used the blog more for longer texts.

(the graph above was generated with R and knitr, which is another reason why I'd like to switch to an SSG, however I haven't quite integrated knitr into my workflow yet).

Recently, I found my PhD wiki overrun with spam accounts. Because the canonical version is stored offline, I was able to simply do another sync, and the wiki looked just as nice. I've also had WordPress hacked two or three times, and it was not so easy - once I lost all my posts, and had to recover them using Google Cache. Knowing that I have my entire post history on my harddrive, in "future-proof" Markdown, is a great feeling.

Nanoc is written in Ruby, and makes it very easy to configure, design compilation rules and filters (like plugins in WordPress), etc. Out of the box, it requires a fair amount of setup, but I've used my previous WordPress setup in almost 9 years, so spending a day or two to get it right will hopefully be worth it. I had a look at a nanoc blog skeleton, but not only did it need tweaking to work properly with the latest version of nanoc, there was also too much magic going on. In the end, Dave Clark's guide to building a blog with nanoc got me started, and by making all the changes myself, I understood much better how the system is working.

The transition is not by any means "finished", there are many older blog posts that need cleaning up, the layout (based on Mark Reid's site) is nice, but needs polishing, etc. But I decided to eat my own dogfood and put it up. Already most posts should work, on the same URLs, and I'll slowly fix the rest.

PS: I used linkify extensively for this blog post, and I love it


Time tracker one week on, new features

March 29, 2013, [MD]

Last week I wrote about how I began resurrecting a three-year old time tracker project with R graphs. I then added a timeline, and began thinking about other kinds of data that I could track.

It's now been almost two weeks since I began playing with this script, and the first thing I can say is, I'm using it. Both because the graphs generated are compelling, and because I've been able to add some useful functionality, it's become a part of my workflow, and I've been quite rigorous about logging my time, without ever feeling that it was intrusive or annoying.

Automatically log on computer sleep or wakeup

One key feature I added, was the ability to detect when my computer goes to sleep (by closing the lid) and when it wakes up, using Sleepwatcher. It ends the current activity on sleep, and automatically begins "surfing" when it wakes up - I have to tell it if I am doing something more productive, but that little popup saying "surfing" also reminds me of that. And being able to quickly close the lid when my wife says dinner is ready, without worrying about the time tracker running, and some category getting an extra bunch of hours added on it, is also a relief. (I also changed the format of the data files to nice time codes, like this:

2013-03-29 09:05:35 -0400,PhD offline
2013-03-29 09:42:35 -0400,surfing
2013-03-29 09:46:14 -0400,PhD offline

This makes it easier to manually edit (something I've almost never needed).

Automatically turn off the Internet, based on activity

To provide more focus, I've also added a feature which turns off the Internet if an activity containing the word offline is entered (for example PhD offline). I just use ipfw for that,

def internet(status) # true = on, false = off
  if status # enable
    `ipfw -q flush`
  else
    `ipfw add deny all from any to any`
  end
end

Of course, it would be easy for me to override this manually, but unlike apps like Freedom, it doesn't aim to block me from the Internet for a certain period of time, but only for however long I'm using the offline activity. I can switch back to surfing at any time, but I have to do so consciously, rather than mindlessly Alt+Tabbing to a browser and pulling up Reddit whenever my brain encounters something difficult.

Auto-completing activity chooser

I found that I often had to peek at the keyboard shortcut list before switching to an activity, and sometimes needed activities that were not on the list, so I added a command on Ctrl+Alt+Cmd+Enter, which pulls up a text-entry box that autocompletes on all previously entered activities. I was curious whether I would end up preferring this, or the direct keyboard shortcuts. After a few days, I'm using both - I remember a few keyboard shortcuts very well, like 9 for surfing, 6 for hacking, 7 for tasks, and 0 for rest (although I usually just close the laptop), whereas for specific projects, it's quicker to pull up the window, hit the first few buttons, like *la*, *laurie *pops up, and I hit enter to select it.

Adding week view and traffic light

I've added a cumulative view over the last 7 days, and also a rudimentary "traffic light" (thanks to StackOverflow): green if I spend more than 4 hours per day on my PhD, yellow if more than 2 hours, and red if less than 2 hours. Looking at the graph below (click on it to expand), you can see that I have not been spending enough time on my PhD lately, and hopefully this "nudge" will help me improve on that!

Future

The popup view is already getting very overcrowded, so if I want to do any further analyses (which I surely do), I will have to start writing up a knitr report, to be viewed in a browser. (My friend Bodong suggested a Shiny app, which I might also look into). I think I will need to begin storing the data in a sqlite database, instead of in flat textfiles, to enable easier integration with other data sources, both automatic ones (Chrome history, Fitocracy API, PDFs read), and self-logged variables. I also need to think about how much processing I want to keep in Ruby, and how much to do in R.

I've got two 10-hour bus trips coming up next week, on the way to and fromthe Coursera conference, so maybe I'll get some more work done it then. In the meantime, I'm going to focus on my PhD, and try to turn those traffic lights green!


Link-helper for Markdown, using Google Chrome history and other sources

March 28, 2013, [MD]

I've been writing more blog posts than usual lately, because of the Beyond the PDF2 conference, as well as some hacks I've been working on, and I realized (again) how much of my time is taken up with finding and inserting links. I often have quite a lot of links in my blog posts, the graph shows a bunch of posts with only two or three links, but most have more than 10, and a few have up to 40 or 60 links (these two are the winners, with 78 links each).

Typically what I do is Cmd+T for a new tab, type in a Google query, select the new page, Cmd+L and Cmd+C to copy the URL, back to the blog editor, insert link, etc. It's quite quick, but for 10, 20 or 40 links, it takes a significant of time, and also disrupts the writing.

When you use the UI to add a link in WordPress, it automatically suggests other blog posts, either by recency or by a search term. This is great, but it only looks at blog posts - what if I want to link to my YouTube video, or a wiki page? I'm also trying to (slowly) move away from WordPress onto a static site generator, probably nanoc, which relies on editing MarkDown.

As I was looking into accessing the Google Chrome history for some quantified self experiments, I realized that almost all the pages I link to are either pages I've recently accessed in Google Chrome, or my own pages, either from my blog, my wiki, or my YouTube, Vimeo or Slideshare channels. What if I could quickly search those sources, and have the resulting link inserted in MarkDown format? See the result in the short (2:30min) screencast below:

The script is triggered by Keyboard Maestro, grabs the currently selected text, and looks up in a bunch of data sources (some, like the YouTube, Vimeo and Slideshare channels are cached using the relevant APIs, others like the Google Chrome history, and my wiki pages, are live), and presents the choices using Pashua. If I make a choice, it then formats the link accurately depending on which application I am using, and in Google Chrome, the URL of the tab I am on (wiki markup for my wiki, Markdown on GitHub, etc).

The source is on GitHub, and it should be fairly easy to get running, especially if you only want Google Chrome history. There might be some individual quirks in how I access my wiki pages for example, but feel free to contact me if you have questions.

Stian PS: The other thing I spend a lot of time doing, is selecting, resizing, uploading and inserting pictures, and that's another thing I hope to simplify when I move to writing my blog posts in Markdown.


Unique publication IDs in open scholar search

March 24, 2013, [MD]

At Beyond the PDF 2, I gave a Vision talk about "An open alternative to Google Scholar", and since then a group of us have begun discussing how we can make this happen. There isn't yet a fixed place for this discussion to take place, but we can use the broader hashtag #scholrev (see Peter Murray-Rust's blog post) to coordinate.

Researchr and "Scrobblr"

Many of my thoughts on this topic came out of my work with Researchr, and my wish to have a system with an open API, which would let me integrate search, metadata lookup, etc, with Researchr. I also wanted unique IDs for publications to be able to link my notes about an article, with notes somebody else took about the same article. Together with Ryan Muller, we began work on "Scrobblr",  a social hub for reading. The idea was that it could work like Scrobbler for music, where what you listen to is automatically submitted, and shared with your friends. In the same way, the papers you read are automatically submitted, and other people in your group can see what you are reading, and automatically import your citations (screencast demo).

Although we never got there, we thought a lot about how this could be expanded to a much larger social hub - sharing bibliography lists from different Researchr users, auto-suggesting "you've been reading many of these papers lately, you should get in touch with this other student, who is reading a lot of similar stuff", automatic PDF hash-based lookups (screencast demo), etc. I began writing up design ideas in a document that was never finished, but many of these are relevant to the current ideas about an Open Scholar Search, so I'll post some of them here.

Unique IDs for publications

There are a lot of reasons why we'd want unique IDs for publications, making citation lists unambiguous and easier to parse, enabling rich citations in non-traditional media (wikis, blogs), etc. Right now CrossRef DOIs is the closest we come, and they already show how difficult it is to push for the usage of such identifiers (ORCID will have a similar challenge). It would be great if it were possible to build on the work CrossRef has done, and I've recently become aware of how much interesting innovations the team is coming out with (slides, blog).

However, there are a few barriers. The first is economic - as far as I can see, it costs a minimum of \$330 for a publisher to participate. This might not seem like much, but I know of very few independent OA journals that have DOIs. (There might also be large technical implementation costs, I don't know). However, worse than this is that only the publisher can submit metadata. This means that we have to rely on them to submit correct metadata (and although that's often the case, it's not always). It also means that we will never get metadata/identifiers for publishers who don't participate, who don't even exist anymore, or for scholarly material that wasn't published as journal articles (we might want to cite video films, archive items etc, and have unique identifiers for them as well).

Below I discuss how the identifier might be formatted (from Scrobblr notes). This is also related to who can assign an identifier, in the case of CrossRef DOI, identifiers are assigned by publishers, who get their own "name spaces" (similar to ISBN, DNS or IP numbers). In the case of ORCID, who share the their deliberation about identifiers, numbers are assigned centrally, and are simply arbitrary numbers with a specific formatting. This will probably end up being the case with articles in open scholar search as well, but below I play with the idea of using something more semantic - after all, it's a lot easier to give a hat tip to @houshuang than to http://orcid.org/0000-0002-2632-8448, even though both are equally unique. And it is a fascinating idea to be able to write [@scardamalia2006knowledge] in any blog or wiki, and have it work...

Unique IDs

(*From "Ideas for Scrobblr":)*

Each publication should be assigned a unique ID (UID). This is inspired by the integration of many different applications that is enabled by the concept of a citekey in BibTeX. APIs should enable users to submit UID and receive metadata for any publication (whether in JSON or BibTeX, whether strictly citation info or also social info about tags, other users, links etc). There should also be a number of ways to determine a publication’s UID through various lookups.

Format

There are (roughly) two choices for the format of a UID. The first would be a randomly generated (or sequential) ID with no semantic meaning, whether with numbers or letters etc. The second would be the citekey format which researchr currently uses. The advantage with this is that it is familiar to users (of LaTeX / researchr etc), and immediately conveys some minimal information about a citation. Through use, certain frequent citations might even be recalled actively or passively. Certainly, it is much easier to reorder three publications cited in a blog post using citekeys ("I’ll put the scardamalia2006knowledge first, and then mention johsnson2000corruption") than using random IDs ("See for example 3093049304955 and 88585").

However, there are a few challenges with using the citekey format. The first is generation and the second is collisions. Although the general principle is well understood (last name of first author + year + first word of title) there are a number of permutations, for example

  • I prefer manually changing van2006knowledge to vanderwende2006knowledge
  • what to do with punctuation, is it peter2006knowledge or peter2006knowledge-integration
  • it often makes sense to include the first word with more than n (=3?) letters, etc.

This results in citekeys generated by researchr or other tools (Google Scholar) and Scrobblr to be different. Some of these we can just define arbitrarily, but we might want some decent algorithm to solve the first point above - perhaps joining the words of the last name without spaces.

Given that we can thus generate nice citekeys from submitted metadata (much of which won’t even have a citekey, or have a citekey in a totally different format), we encounter the problem that the citekey in the database might differ from the citekey in the user’s local system. One approach would be to use Researchr or other plugins to “harmonize” these (i.e.. automatically modify citekeys on the user’s end) - this would have to be done early in the import process, because everything locally is tied to the citekey (PDF name, wiki pages). (Of course, in the future Scrobblr will be the first place we go to download papers in our fields anyway so theoretically we won’t even have this problem :)) Or we could just accept that there will be a discrepancy here.

The second problem however will be collision. It is likely that there will be cases of several papers generating the same citekey. Again we’ll need a way of resolving this. A simple way would be to add “b” to the year or something like that - not very elegant, since it will look kind of “random” when viewing it outside of a context. Another approach could have been to go back and give both articles a longer citekey to avoid collision (perhaps the first two words of the title), however, given that a citekey once assigned should be absolute, this is impossible.

Given that we can solve all of these things, the final concern is user confusion about local citekeys and Scrobblr citekeys, given that they look so similar. One way to mitigate this in practice would be to come up with some notation for linking to citekeys which specified that they were Scrobblr citekeys. Currently we are using [@citekey] for citations, but this is purely random, it could easily be something else. It would however be great if it was something both easy to type, easy on the eyes, and still fairly unambiguous. Since citekey is rarely used on the web today, it would for example be easy to write a plugin that scanned a blog post for this notation and recognized citations.


Tweets from Beyond the PDF 2

March 22, 2013, [MD]

I had an amazing time at Beyond the PDF 2 in Amsterdam (March 19-20). I met so many of the people whose blogs I've been following, did a demo of my open academic workflow, gave a 3 minute pitch for "Why we need an open alternative to Google Scholar", for which I won a shared second place, and came back filled with ideas and connections. I hope to write a series of blog posts highlighting some of the things I saw and heard at the conference, but I thought I'd start by posting an archive of my tweets.

I've done this a few times before, from Critical Point of View in Bangalore, Learning Analytics Conference (and pre-conference) in Banff, and OAI6 in Geneva, and even created a tutorial on how to generate this list using TextMate. This time, I grabbed the archive from Bodong's Twitter analytics app, and used R to select only my tweets. I cleaned them up a bit, removed some purely housekeeping ones, and added a bunch of links for context.

This archive will also be useful for myself in reminding me what we talked about, things to follow up on, etc.

Before the conference

  • Amsterdam beer and talking semantic publications with @pixievondust and @jschneider, great opening to #btpdf2! Looking forward till tmrw

Day 1

Before lunch

  • RT @kaythaney: .@pgroth kicks off with a challenge: what would you do with \$1K today to make research communication better (note ...
  • Twitter ecosystem slowly being suffocated, and I find myself using it less and less, but for conferences still where it's at.
  • Kathleen Fitzpatrick opening. Notes from her talk "Peer-to-Peer Review and Networked Scholarly Communication"
  • @jeroenbosman Are any of these really suited for today? Why so much focus on italics etc when we can use URLs, DOIs etc? (genuine q)
  • Fitzpatrick's mom when publishers couldn't publish her monograph bec. of commercial concerns. "They were expecting to make money?"
  • "You could just publish your entire monograph online, with reviewer comments - but I know that's not realistic"... @kfitz: "Why not?"
  • Anyone researching the future of scholarly meetings? Been to some innovative confs, but would love to see research, documentation
  • "learning at conferences" should be a field of study (want 2 see learning scientists use learning theories to rsrch rsrch)
  • @rmounce Remember that "theoretically" many may read the monograph at a library. Many journals with only 400 subscribers too.
  • @petermurrayrust What's your definition of citizen hackers?
  • Give grad students researching in other countries grants to help translate thesis #1k
  • Re 400 monographs sold, my MA thesis was downloaded 1600 times, so I guess I'm doing OK :)
  • Would be great if you could do some data viz/analysis of tweets, @bodongchen like
  • #MOOCs and - Indig Educ@OISE
    • 22,000 students using OA papers in their learning. Powerful argument for #OA
  • @rubp Cool, would love to come. I've also been playing with open scholarly workflow - lit review etc
  • Wish name tags were bigger - know so many people here but not faces...
  • .@edsu Giving a vision talk "Why we need an open alternative to Google Scholar" tomorrow. Happy to discuss!
  • .@AubreyMcFato @edsu @doajplus Yeah, one of a number of projects that it would be great to build on - so much unexplored potential
  • @edsu Journals should ping a server w/bib* metadata, like blogs. Standardized micro data. Importing existing databases. CommonCrawl
  • Love idea of reproducible research, see these cool showcases of IPYnbs and R+knitr
  • .@jschneider @aubreymcfato @edsu @doajplus Breakout on OpenScholarSearch - let's make it happen, when/where?
  • Compare diff aka search sites w my name BASE, GScholar, MS Academic Search. GScholar wins...
  • Wrote up a bunch of unfinished detailed ideas for a social reading hub, much applies to #OpenScholarSearch
  • @neuro_cloud Not sure if Reddit is the best tool for this, but doesn't hurt to try.
  • Funny disembodied conference, saw two old friends tweeting with, excited to meet them in person - both are not here :)
  • For people who see #RDALaunch hashtag and want to know what it's about, more info
  • @axfelix There will be a lot of new innovation in RSS readers in the next 6 months... Is GSch similarly stifling innovation? #goodenuf
  • @Protohedgehog They also funded 33 students to attend, pretty amazing. Let them try to keep up as we try to out-innovate them. :)
  • .@axfelix GScholar has no API, and there will not be one - pretty crucial.
  • @axfelix Is that an argument for or against? I also want to interact with aka search, don't have lot's of ppl to de-engineer...
  • @axfelix I think (hope) there are ppl attending who do believe in overthrowing global capitalism (or GSch)... we can build sth better!
  • @Dreusicke I like being able to download data and display/process how I want. Only on the web like DRM - could disappear tomorrow.
  • Backchannel has been great, but I'm worried that it will fade after lunch as everyone start running out of juice... #no-power-plugs
  • @axfelix Makes me think of all the congratulatory tech news stories about use of Linux in China. Reality is almost Win-monoculture
  • @juancommander glob south, I'd love to see issue of language raised as well, maybe more central to socsci/humanities than hard sci?
  • @axfelix I think bandwidth and lack of ubiquitous/always-on access is much bigger problem (and maybe access to devices/mobile)

After lunch

  • Journal of open research software sounds great.
  • Online journals and faster horses
  • RT @jschneider: Brian Hole: "Articles are so '60's" -- with image of Philosophical Transactions 1665
  • RT @juancommander: @jasonpriem doesn't mind the little time because he talks twice as fast, he gets double the words per minute
  • @juancommander looking forward to my 3 min vision talk tmrw #talk-fast
  • I'm so used to watching Coursera lectures at 2x speed so Jason was like slowmo for me ;) #going-going-gone
  • RT @CameronNeylon: 11-16 million hours spent each year on reviewing papers that get rejected just in WoS..that's around 1000 years ...
  • Kaveh on copy editing "we read it many times so you only need to read it once"
  • Kaveh: one very complex page of journal w math can take 2000 lines of XML
  • RT @rmounce: The blog post should be the 'Version of Record'. Dump XML, make HTML-5 the base VoR standard @kaveh1000
  • Interesting point by Murray Rust regarding the need for typesetters, with good incentives, people can do it themselves
  • This panel is highlight so far, so many neat projects! How can people get involved, contribute?
  • I'll be graduating in 2 years and looking for a job. A job board for people who care abt sci2.0 and openness?
  • Funny w where presenters can't do live demos, presenter suggests beyond the PowerPoint conference ;)
  • @juancommander how about independent open access journals w no budgets?
  • @juancommander many journals say they'll waive them for those who can't pay. Not sure how works in practice
  • @rmounce we're doing institutional research on that at utoronto. And I think many other institutions are as well
  • #Peter Murray Rust talks about the "scholarly poor" - non academics, SMEs, etc. Making the case for research funding.
  • @kelli_barr not just access to Western rsrch, when doing rsrch on Indo libr, easier to find Western than loc rsrch, not digital
  • @kaythaney @neuro_cloud which includes language
  • RT @kaythaney: I love that "we all know we're moving towards Open Access" is an understood, throwaway comment in this crowd. #oa
  • Anyone going out for beer and more food after this? Or everyone jetlagged and tired? :)

Day 2

Before lunch

  • Academics reading almost twice as many articles per year as in 1977. Huge increase with e-access, beginning to level off. Tenopir (report)
  • RT @anitawaard: 80% increase in nr of article readings; 30 % descrease in time per article read - unsustainable trend! Tenopir
  • I must've shown my scholarly wiki to a lot of people yesterday, given how many pages I had to remove this AM
  • Insightful, how do we address this? RT @mfenner The Price of Innovation - my Thoughts for Beyond the PDF
  • RT @petermurrayrust: Almost no mention of #scholarlypoor - OUTSIDE academia. Usually called "consumers". T ...
  • RT @petermurrayrust:. Anyone present want to challenge the system and create a bottom up OPEN infrastructure for #scholarly comm ...
  • Excited about "Making it happen" session - although shouldn't that be the two day conference, rather than 1.5 h? :)
  • @jschneider @tac_niso Also wonder about other submission formats - anyone accept papers in Markdown?
  • Haven't talked much about authoring tools/workflows here. Anyone excited about scholarly Markdown? @mfenner
  • Wish we had more time for breakout sessions (more sessions). Several things I want to talk to people about
  • Reuse of scholarly slides at #btpdf2, very cutting edge :)
  • RT @CameronNeylon: I want to lock all the developer groups in a room until we agree a path to an interoperating ecosystem. Then ...
  • @CameronNeylon Is there an empirical study of how often locking people in rooms lead to solutions? It seems like a common idea! :)
  • 3 min vision talks are the last session today. Maybe it should be the first, so we could spend day planning/implementing? #btpdf3
  • @CameronNeylon So we just need 500 people to tweet with #1k hashtag. #we-can-do-it
  • @kerim @ilya @mfenner @criticmarkup Very neat, how can we push this forward? I'd like to be involved. #lets-make-it-happen
  • @InfoFuturesNYU @datadryad Eating dogfood, brilliant. Data about data sharing by scientists, openly available.
  • Hackfest to create tools/workflows/documentation on using Scholarly Markdown+Git for academic authoring/collab #1k @CameronNeylon
  • @TAC_NISO @ianmulvany @cameronneylon And all ideas look more brilliant when sketched on napkins! Need napkin.js #coffe-stain.js
  • Would have loved to see @xieyihui from knitr and @fperez_org from IPython Notebook at this conf, executable docs->beyond PDF
  • Graeme Hirst: Post-modernists are high-verbiage, zero logic. Vs people who only talk in LaTeX-math mode. Great talk! @graemehirst
  • @petermurrayrust @okfn Would have loved to join OKFN conf/hackfest. Always tricky to get funding. (I'm ironically here because of Elsevier)
  • RT @petermurrayrust: We shouldn't spend so much time talking, we should be doing and creating
  • @pgroth @petermurrayrust People travel across the world anyway, have an extra day of "doing"... just need a location with wifi+power
  • RT @anitawaard: .@erwinverb Tools
  • @cgueret There was a call to design it, don't know how far they got. CriticMarkup interesting for collab
  • @utopiadocs How I embed Skim PDF rdr in my workflow-love highlight, export clips, and AppleScript support.
  • Five min flash talks - nice warm up for 3 min vision talks later today. Great with pitches, call for action
  • @anitawaard` is amazing like always - great example of how lab research is actually documented in US labs
  • RT @jschneider Sensemaking involves collage-based manipulation of electronic, born-digital materials, printed and annotated on paper. @anitadewaard
  • Graft tools closely on scientists' daily practice. (Why "anthropological studies" of researchers important, Tenopir etc) @anitawaard
  • Love the X-Files theme when speakers go over their time
  • Love the work @swcarpentry do teaching sw devel practices to scientists. Who're teaching them new research workflows/tools/publ? #1k
  • @researchremix Will need more than #1k, but brilliant idea. YC and HN are big inspirations #incubator-for-schol-comm-startups
  • RT @researchremix: #1k A YC/techstars incubator for scholarly communication startups. Mentorship, leg up in biz, marketing, fun ...
  • @researchremix Does it also require more work on funding models? Can we pitch VCs on funding projects with open source/OA etc?
  • PKP XML tool looks awesome, was shocked when I first realised OJS didn't have document pipeline, very happy to see this! @axfelix
  • Out of the box NLM-compatible styles is huge, the PKP doing this is huge.
  • @phillord You can generate HTML, PDF etc from NLM XML. Semantic vs presentation format.
  • Great that @rmounce is talking about PDF metadata, I've been frustrated over this for a long time! Let's go for #low-hanging fruit!
  • Everyone should read this: Why can't I manage academic papers like MP3s? #pdfmetadata
  • Yes, I want publications tagged with whether they are OA, license... Only on the web, but after I downloaded, how do I know? #rmounce
  • talk on #ORCID, excited about potential, looking forward to seeing them more in the wild
  • "There is no problem if you just give me more money and use my tool" ... so true :)
  • @pixievondust What does use constitute? I have one, and I'd love to tag my pubs etc, but not sure of journals who support that etc.
  • The revolution will not be peer-reviewed #scholrev
  • RT @conjugateprior: What is Word doing in #Btpdf2 workflows? Like a family gathering where everyone tries to ignore your psychopathic un ...
  • RT @petermurrayrust: ~25 revolutionaries met at lunch . Will coordinate under hashtag #scholrev (tag seems to be fairly free)
  • RT @maurice_: #scholrev find the low hanging fruit AND put up an inspiring vision far beyond the PDF
  • @rmounce @gbilder Nice, seen some impressive stuff from CrossRef lately - need to look more into! #pdfmetadata
  • RT @openscience: Lovely: @bodongchen's #Shiny app for tweet analytics
  • Most active tweeters at @houshuang, @pgroth, @rmounce and @kaythaney
  • RT @GullyAPCBurns: how could knowledge engineering researchers work with publishers to be more effectively as a community to inn ...
  • "Universities are big bags of researchers who have a shared need for car parking"!
  • Crowdsourcing vision talk on Alternative to Google Scholar: talking notes here, help me edit #scholrevo
  • @houshuang 28 concurrent viewers in Vision talk document, I love you all, you're crazy! :) #scholrevo (leave names and I'll cite u)
  • Lot's of great ideas at Vision talk "Open alternative to Google Scholar" doc, thanks all! #scholrev
  • RT @neuro_cloud: @battagliaem I would like to see more breakout/work groups. So many ideas need to translate to works! #BTPDF2 #BTPDF3 ...
  • How about recognition for activists - many work hard on deep ideas, but difficult to publish them, #should-be-working-on-my-phd
  • @ianmulvany PS: loved blog post about Encode. Used it in 2 presentations about OA
  • @ambrouk I had the idea of a "fair trade" logo for publications 5 yrs ago, includes #OA, and translation

After lunch

  • @mfenner wish it was more extensible (Pandoc), I'd like the citations to have links to my wiki - plugins, filters without learning Haskell
  • RT @dshotton: To escape ISI, open your citation data! See Open Letter to Publishers
  • Giving public talk about ≤a href=https://plus.google.com/113732143584807227124/posts/82FiF4gr2U5\>Open Science at KU Leuven tomorrow #oa
  • The Google Doc from my talk will transition to a strategy/planning document, add name&ideas if interested
  • Just gave talk abt open scholar search, what a rush. The conversation continues
  • @mfenner it's hard to be subtle in 3 mins but I'm actually all abt coordination as opposed to reinvention
  • @mfenner again, let's do it. I can't do even a fraction by myself, eager to contribute to projects that can.
  • @phillord my ideal is citations just list of unique identifiers. User/app can render any way they want.
  • @lukask @phillord any journal accepting markdown+btex file? publishing these alongside HTML etc? Maybe modify Jekyll for jrnl only
  • @mfenner @lukask @phillord happy to. Know of any bibjson ruby libraries? otherwise we should make it
  • @axfelix does it accept markdown as well? Is this released in OJS or when coming? Sounds awesome and overdue #ojs
  • #Kaveh: publishers, give us XML. Yes! Publishers: format is not your business. Want to read in my house style. Yes!
  • #ah the perils of live demos on others' computers, very brave!
  • #very impressive by Kaveh! Why we need d/l-able files, control to readers not platforms.
  • Kaveh: It's not for sale, but you can get it at a very good price.
  • Publishers whatever, how can I get my PhD thesis into XLM w/I ppl in Kerala?
  • RT @memartone: Feel free to propose that we all use ORCID ID as part of the FORCE11 pledge. If the issue has been settled, let ...
  • RT @axfelix: I think it's quite telling that @kaveh1000 is killing it at with -- what's this? -- a PDF generation pipeline. beca ...
  • RT @mfenner: Popular vote for best idea goes to Carol Goble, @houshuang and @kaveh1000 They also get my vote, cool ideas, and w ...
  • @rguha not sure I agree. A lot of my friends happy editing wikis, short step to Markdown, huge step to latex.
  • Looking forward to videos becoming available from - stream is down. Some things so good I want to see again / share.
  • RT @TAC_NISO: Transcribed version (with draft intro from me) of Amsterdam Manifesto for Data Citation. Thoughts please ...

Stian


More thoughts on quantified self, tracking and visualizing

March 18, 2013, [MD]

Yesterday, I wrote about my tiny timetracker script, resurrecting some 3 year old code, cleaning it up a bit and adding a simple R graph of my day. The script makes it very easy to track intention (ie. I am the one saying what I am working on, it doesn't try to infer it from my activity), and over time the log files should prove interesting.

R graphs

I started wondering about other ways of representing the data with R graphs. Right now, it's just showing a simple bar graph with the cumulative amount of time spent on each category per day. It would be easy enough to make similar graphs per week, month, etc, and also easy enough to correlate other measures that I tracked per day (temperature, time getting up, mood etc) with cumulative activity in each category for each day (ie. on days when I got up early, I got more hours of PhD reading done, etc).

However, the log files don't only contain information about how many hours I spent each day doing different categories, they also contain information about when I start and stop different activities. So I might be able to find correlations like "I tend to get more done on my PhD on days when that's the first thing I do", etc. To begin with, I tried to find a way to graph the day's time use as a timeline.

Categories

There are still some challenges with the script. The first is how I log categories, right now I have 10 slots (0-9), but since I log the full text, rather than the number, you can change the categories in settings.rb, without risking to "overwrite" earlier logs. However, I realized that I wanted to log at different levels of granularity. For example, I might want to know how much time I'm spending preparing for a presentation in a few days, but I'd also like to know how much time I spend each month preparing for presentation, or even on "schoolwork" in total.

I could attach categories to the projects in settings.rb of course, that would be easy. I would have to determine whether I wanted the categories to be exclusive or not. If they are exclusive, I can add them all up, and get the total amount of time spent. If I want overlapping categories (presentation is both school work and authoring, whereas writing a blog post is authoring, but not school work), I'll be able to look at time use in different categories, but can't compare them against each other (plotting authoring vs school work wouldn't make sense, since the time spent writing the blog post would be double-counted). I guess an expense tracking system that let's you tag your expenses in different categories has the same problem.

One problem is that I don't quite know how to store or represent this information effectively in R. I had the same problem when I imported Google Analytics data together with metadata about all of my blog posts. My blog posts usually have several categories attached to them. Initially, this is just a text field with each category listed like "oa,publishing,china". How would I represent this in a datastructure in R, so that I could see whether certain tags were more popular than others, for example? Would I have to duplicate the post, so that I had three entries for the page, one for each tag? Or turn the tags into binary variables, so that for each row I would have columns for all the tags I've ever employed, with a 1 for in use, and a 0 for no? (And is there a function to remap the data like this?)

Other data

I also thought about other sources of data that I could track either explicitly or automatically. Some of these would be interesting to track and visualize by themselves, others would be interesting mainly as related variables. I could for example easily track all the scholarly PDFs that I read, by taking note of when clippings are exported to Researchr (I could log both the number of PDFs read, and the number of pages in each PDF). I could also look at the length of the high-level notes that I write about different articles.

It would be quite interesting to wear a FitBit or something similar 24/7, and get detailed information about when you fall asleep, when you wake up, how you move around etc. However, I could at least use Fitocracy's API – if I could query the number of points added per day, that might be a useful proxy for exercise. (If I am diligent about turning on Runkeeper when biking, I could also extract the number of kilometers biked every day).

There are some things that I do digitally, that would be so easy and so interesting to track, but which does not have an interface. I spend hours every day reading on my Kindle, and it would be very interesting to export the number of pages read per day, the time I've spent, speed (seconds per page), etc. But the Kindle does not collect this data (or at least, it won't share it with me).

Entering manually

There is also data other than time-use that I might have to enter manually. I thought about creating a very unobtrusive interface, triggered with a global keyboard shortcut, which would let me type in a variable (with autocomplete), let me tab to an entry field, let me type in the value, and press enter to store (with a time stamp). This could be everything from weight, to bed time, books read, or anything else. (One could even imagine a window that pops up at random times asking about your mood, whether you are feeling tired or energetic etc - but that might quickly become annoying). First draft of interface:

Reports

Right now I am creating a few graphs with ggplot2, running an R script through Rscript, that spits out a PDF, and then I display that PDF with Pashua. When I have more data, and graphs, I plan to create a knitr template (Markdown + R code), maybe even using a templating system, and then run knitr from the command line (through Rscript?), which will generate an HTML page, which I can then open in the browser.

Anyway, that's how far I got in my pondering.

PS: This blog entry took exactly 37 minutes to write, most of which I did on the plane, which is the early pink blob you see on the timeline, then my battery ran out, I arrived, spent some time finding my AirBnB host, etc, and then the timeline resumes :)


Unobtrusive time tracker, visualizing time spent with Ruby and R

March 16, 2013, [MD]

About three years ago,  I read some articles about the quantified self, and how the simple act of observing something can lead to change (often in a positive direction). I've been interested in productivity tools and theories for a long time (it's a constant struggle for academics), and I thought of different ways of measuring how I spend my time. I tried a few different automatic tools which look at which applications are open, which websites you visit etc, but found that the data they generated were not that helpful. If I am on Google Scholar, am I doing research for my PhD, working on a paid research project, or just following a random thought?

So I needed something that took my intention into consideration, but did so in a really easy and unobtrusive way. I had a pretty good idea of how the tool I wanted would look - something that would sit in the menubar, and where I could change which activity I was working on with only a global shortcut. I looked around, but couldn't find any tools that really fit the description, so I began building my own. I wrote some really simple Ruby scripts to log time codes to text file, triggered with a global shortcut program, and used Growl to provide some feedback.

I wrote up the whole thing on my blog, posted the code on GitHub, but actually didn't end up using the system very much (the fate of many productivity tracking systems, I'm sure). Three years later, I've spent a lot of time working on my open academic workflow, and I've also begun experimenting with R for data analysis and visualization. I am also involved in a number of different paid projects, so tracking my time is not just for self-insight, but would also be very useful for billing, etc.

I opened the code that I hadn't touched in three years, updated it a tiny bit (I use Keyboard Maestro now, instead of FreeHotKeys), and then experimented with adding a graph. It took a bit of time getting R to play nicely with Ruby, I began with rinruby, which lets you run R commands through Ruby. However, this popped up a Quartz screen every time I used ggplot to render a graph (even if I never displayed the graph, but sent it straight to a PDF).

Then I tried to run an R script through R CMD BATCH, which worked, but took almost 10 seconds to execute. I later found out that this is an old way of doing things, and that Rscript is the new way. That worked perfectly, and it executes and renders the PDF in 0.8 seconds. I then use Pashua, which I use extensively in my open academic workflow, to display a dialogue with the graph and some extra information.

Currently, it just shows a simple bar graph of activity during the current day, but as I collect more information over multiple days, the data could be visualized in many interesting ways. I know not only how much time I spend on a certain activity each day, but also when I spend the time (and in how large chunks, how often I'm interrupted or start surfing etc). This could be visualized as time-series, and I could even experiment with correlations with other factors, whether external ones (the daily temperature?) or internal, if I track other factors (when I go to bed, what I eat etc).

Only time will tell if I keep using the system, but perhaps this possibility of using and visualizing the data will be enough incentive to track. It will also be very interesting to see how much time I actually use on various activities - for example I need to give a presentation at Beyond the PDF2 in Amsterdam in a few days - exactly how much time will it take me to prepare?

Stian

 


Open Access at IgniteAlberta

February 23, 2013, [MD]

I was asked last minute to fill in for Nick Shockey of The Right to Research Coalition to give a talk about Open Access at IgniteAlberta in Edmonton. I spoke at a session about Open Access and OER, shared with Cable Green from Creative Commons who participated remotely.

The conference was put together by the three student associations in Alberta to bring together student leaders, faculty, administrators and people from the province to discuss the future of Alberta's higher education system. The sessions were a mix of large plenary presentations and smaller break-outs where everyone were seated around small round tables, and were much of the time was spent discussing in groups, and then summarizing back to the larger group. I really liked this way of organizing, and learnt a lot from the professors and students that I sat next to.

I also realized how little I know about the higher education context outside of Ontario. In Toronto, it's easy to assume that Alberta has lot's of money and few problems, but I heard about provincial cuts, and also the challenge of low high school and post-secondary completion rates.

I've given many talks about Open Access, and when I am asked to speak, I usually remix slides from older slide decks, but it's always a challenge to organize it in a few that will make sense to the audience, fit within the timeframe etc. This time, I only had about 20 minutes, and I wanted to convey both a basic understanding of what Open Access is, and some of the excitement that I feel for it. I came up with a basic framework of "what, why, how", and think I was able to cover a lot of basics in the 20+ minutes, together with some neat examples, and updated news, such as

(slides)

Stian