April 6, 2013, [MD]
As many times before (most recently, Beyond the PDF 2 in Amsterdam), I've archived my tweets from the recent Coursera conference, cleaning them up just a little bit (I took out most retweets, but included a few). See also my impressions from the conference.
Before the conference
- Will be in Philadelphia for #CourseraConfAtPenn Fri+Sat. Anybody wants to meet to talk about #oa, #btpdf2, learning, #OER?
-
The Coursera Partners' Conference gets underway tomorrow, April 5th, 2013. We're stoked! #CourseraConfAtPenn
- If you’re hashtag is 18 characters, you’re doing it wrong. Looking at you, #CourseraConfAtPenn.
- @Akibaedx Great to see that edX and Coursera are hanging out :) Very interested in research on flipped (PhD stud from UofT)
First day
- Great name tags, able to see name and affiliation without leaning in. Makes connecting online names to faces much easier
- @derekbruff @rschwartz418 It does, but it's the easiest thing to work with. And students can mix and match.
Silfen forum
- going to overflow room to see panel on MOOCs. Very meta. Will there be computer grading of questions?
- Interested in talking to anyone researching MOOCs/flipped classrooms from persp. of education/learning sciences/CSCL
- @icpetrie MOOCS are a) revolutionising education, b) more of the old, c) just the top of the iceberg, d) hype :)
- Can MOOCs support non-high performing learners? Old q, OCWs, Hewlett Foundn, P2PU etc have struggled w/ a long time
- MOOCs are at the Altavista stage, Google hasn't even come out. Friedman good at catchy phrases
- Friedman: MOOCs instead of F16s in Egypt
- @dankonecky I would love to see MOOCs/OER from institutions in Egypt. Course on history of Arab region, by a local uni
- @hong_chau Make it happen, let's adapt some self-organisation from the original cMOOCs! :)
- Given that we are so eager to reach low-performing students, will Coursera partner with any community colleges?
- #CourseraConfAtPenn hash tag length is killing me. Er, I mean challenging me to be concise.
- Watching Coursera lectures at 2x speed has spoilt me. Not about concentration length, but cognitive bandwidth #tooslow
- Very US-centric MOOC discussion, weird given the very international audience at #courseraconfatpenn
- @veletsianos UofT has partnered with both (but no EdX courses live yet). My own research more on flipped. But also doing institutional rsrch
- All the people on the panel talking about the wonder of four-year degrees need to read this
- Nostalgy for brick-and-mortar: Beyond row three, it's all distance education anyway
- Would love to hear EU perspective on MOOCs - how does it look different in the European context? /c @gideonshimshon
- @gideonshimshon Nice, wish they had released open data sets, not just PDFs :)
- Discussion abt rest of world similar to OCW-be generous and share. But awesome stuff happening in the rest of the world
- @cvhorii Global perspective 1-way cultural flow? Can glob courses help our international students …
- @icpetrie When will Harvard step u and translate a Chinese course into English? Why only one-way?
- @dankonecky Are there learning scientists working seriously on this, or is it all CS/ML people? #wanttobepartofit
- +100 “@rschwartz418: Shouldn't we all have watched lectures, panels, & keynotes before #CourseraConfAtPenn and spent time interactively?”
- @rschwartz418 We need flipped conferences, not just flipped classrooms!
- @derekbruff @rschwartz418 Agree that platforms don't support. Talked abt this at #p2pu, supporting learning journeys
- @D_Carchidi @cristofolo @bederson I think faculty roles will change significantly, for the better.
- @amywoodgate @rschwartz418 Maybe next conf should be more like a cMOOC, less like an xMOOC? :)
- @bederson Great, very useful link, thanks a lot. Backchannel more rewarding than panel right now
- The big loser of the conference so far: AltaVista. #CourseraConfAtPenn
- @taevans Yeah compare cMOOC/xMOOC, part of open internet or closed garden? Course ends vs community continues
- @bederson My problem is that there is too little research really problematising the f2f teaching/learning in h.ed.
- And thanks to all the Twitter-panelists as well! ;)
Keynotes
- Getting ready for #Courseraconfatpenn keynote. Good wifi and power - let's see how much twitter activity
- @derekbruff Interesting. Seems like all the focus (not only there) is on purely online though.
- @derekbruff A lot of prescriptive stuff, I need theories to fuel my research.
- Coursera, has it really only been one year? Amazing.
- 3.1 mlm students, 333 courses, 2330+ instructors Koller,
- Koller proud: 30 of top 60 universities worldwide, #1/#2 in 14 countries
- Largest course: Think again, how to reason and argue 180k.
- Made $222k on signature track in 11 weeks
- Money will be shared back w/unis, Koller with cheques. 1000 applications accepted for financial aid.
- Students signing up for signature track to signal commitment to finishing course, higher completion rates
- 40% of Coursera users in dev'l world (3% in Africa), Khan acad 80% in US (surprising)
- Looking forward to first Coursera course in Hindi, or ki-Swahili!
- @amywoodgate Sustainable if people pay I guess, and can easily be outsourced to low-cost countries.
- Retention funnel: enrolled 31k-56k, submitted assignment 900-6.5k, earned SoA 1-3k. Similar numbers at UofT.
- Interesting that Coursera is doing A/B testing - students want to see people's faces
- @amywoodgate I don't think paying people in low-cost countries lower (but fair) salaries is exploitation, but agree with you on ideal
- New tools for smaller groups coming,
- Mobile app coming over the next year
- App platform sounds interesting, physics simulator, note taking tool, etc
- @EvansCowley Maybe it should be toggleable :)
- @EvansCowley You'd think but people tend to really enjoy lectures. I think short demos, 2x speed, pause able, with screen can work well
- Wonder if there's a mismatch between attendees at \nand keynotes… Talking about system issues, but attendees are more instructional designer/online learning people who are carrying out mandates from provosts centrally?
- Looking forward to first panel on participation in MOOCs, getting concrete
- xMOOCs? Never lonely in a cMOOC-just lost.RT @hong_chau: Werbach: #MOOCs r lonely. Students desperately want community. #CourseraConfAtPenn
- @hong_chau One course I took, people posted hundreds of intros first week. Afterwards, almost no activity...
- Discussion about "personality cult" around lecturers, and how attitudes of first-gen "pioneers" transferred upon 2d gen
- /We/ may want community but Edinburgh data suggested only 10% actually wanted to be part of a community #CourseraConfAtPenn
- +100 “@hong_chau: Advice for @coursera and the next #CourseraConfAtPenn... Snacks during break?”
Afternoon panels
- Excited abt next panel on flipped teaching. Doing rsrch on that myself.
- @taevans Would also be useful with wiki to collect notes/artefacts/papers etc. MOOC-style! :)
- Great panel on flipped classrooms, all notes (incl questions) … Let's continue the conversation!
- @preset Cool, awesome notes. Added here
- Created open Google Doc to track notes and other artefacts from #courseraconfatpenn please add!
- P2PU's "Mechanical MOOC" automatically assigns students to groups of 10, and creates group mailing lists
-
Blog of Pamela Fox who is talking right now
Day 2
- Good morning #courseraconfatpenn, great breakfast but a bit claustrophobic poster session :)
- Fostering inter school collaboration - retitled: "News from the front lines"
- Dutch-speakers dominante in the room. Goeden Morgen! ik spreek een beetje Nederlands :)
- Dutch researchers are #2 in the world in individual productivity, after the Swiss.
- Interesting idea from Leiden: Use MOOCs for student mobility (important in European concept)
- Call for empirical collaborative research on MOOCs, great - hope this happens
- RT Next time, need focus groups for topic areas, for instructors to meet and share ideas.
- @coxvaxie Wonder how many people here are faculty, as opposed to instructional designers etc. How to get invite more
- MIT faculty was told by administration: Not allowed to give guest-lecture in Coursera, we're on #EdX only.
- Distinguishing between cooperation and collaboration - collab implies interdependence
- @Afilreis "The quizzes are silly" @DaphneKoller shoots back "They are not" everyone laughs
- Shoutout to #lak13 learning analytics conference starting in 2 days in Leuven
- Dillenbourg suggests universities organising exams for each other's MOOCs, undercutting Pearson
- Duke has one FT staff member assigned to a course while it's running...40hrs/wk dedicated to that course. Dillenbourg: plagiarism and cheating, we still need proctored exams to give credit
- Panel on learning from data looks popular, room filling up fast. Looking forward to Dillenbourg and others
- Dillenbourg: We're academics, we love data, put data in our coffee, brush our teeth with data
- China not well represented. Language issues, or internet access? Subtitled OCW courses massively popular there.
- Dillenbourg is doing amazing research on MOOCs, two open postdocs and 2 PhDs.
-
Notes from session on data analytics and MOOCs #lak13
- How I got all the photos integrated with my wiki notes from sessions
- Would love to see foreign-language MOOCs used in language classes in N-Am #ich-will-lernen #apprendiamo
- Thank Daphne&Andrew for confusing us so much over past yr - in the most positive way. Yes, making our lives interesting
See also my notes on flipped classrooms, and learning analytics.
April 6, 2013, [MD]
This weekend, I attended the inaugural Coursera Partner's Conference at UPenn in Philadelphia. I attended both as a PhD student with an interest in MOOCs, open learning, and flipped classrooms, and as an institutional researcher for Open.UToronto, supporting internal evaluation of the UofT Coursera MOOCs.
Overlap with existing groups
I was delighted to see a bit of overlap with two other groups that I am part of. Gary Matkin and Larry Cooperman from UCI, José Escamilla from Tecnológico de Monterrey and Sukon Kanchanaraksa from Johns Hopkins School of Public Health are all old friends from the Open CourseWare Consortium, and it was great to catch up with them. Pierre Dillenbourg from EPFL is a well-known researcher in computer-supported collaborative learning, and I am really happy to see others from my academic field getting involved with research on MOOCs.
Otherwise, institutional support personell (from provosts and deans to instructional designers and offices of teaching and learning) seemed to be heavily represented, with only some professors who have actually taught Coursera courses present. There were also a few institutions represented who are still considering joining Coursera.
Sessions

The sessions ranged from the initial sold-out Silfen forum with people like Thomas Friedman and Martha Kanter, to much more specific panels on flipped classrooms, learning analytics, inter-school collaboration, etc. (Full program)
I took extensive notes from two sessions, one on flipped classrooms and one on learning from data, both were really interesting and I will probably be in touch with some of the people on both panels to follow up. (It was also a great opportunity to use my iPhone/Researchr integration).

It was also fun to meet the very young, smart and energetic Coursera team. It's truly impressive what they have managed to create in a year (launched April 2012), and I'm very excited to see what they, and others (EdX, Udacity, etc) will come up with. Hopefully in the future, we will have a conference where people can meet across "divisions", from all these different platforms. There are initiatives at educational research meetings, such as MOOCshop, but normal faculty and instructional designers etc are not likely to attend those.
April 2, 2013, [MD]
I've thought about switching to a static site generator for a while, spending some time playing with Jekyll, the poster-boy for SSGs, and learning about the different options, but I always put it on the backburner.
There are many reasons for switching, and for doing it now. My current blog layout has been the same since 2005, and although I've enjoyed it, it's starting to feel old. I've also spent a lot of time writing on my wiki, which runs on localhost and is synced online, with a number of neat tools to help me author more effectively, and these days I find it so much quicker to write something on my wiki, than on my blog.
I don't know if I've really been writing less on my blog lately, or if it just feels like it - it's always been up and down. In the graph below, each point represents a blog post, the y-axis shows length (log-scale), and the x-axis time. You can see some pretty big gaps where I didn't write anything (like in 2007). I also seem to write fewer very short blog posts, which is probably because I began using Twitter and later Google Plus for quick links, and used the blog more for longer texts.

(the graph above was generated with R and knitr, which is another reason why I'd like to switch to an SSG, however I haven't quite integrated knitr into my workflow yet).
Recently, I found my PhD wiki overrun with spam accounts. Because the canonical version is stored offline, I was able to simply do another sync, and the wiki looked just as nice. I've also had WordPress hacked two or three times, and it was not so easy - once I lost all my posts, and had to recover them using Google Cache. Knowing that I have my entire post history on my harddrive, in "future-proof" Markdown, is a great feeling.
Nanoc is written in Ruby, and makes it very easy to configure, design compilation rules and filters (like plugins in WordPress), etc. Out of the box, it requires a fair amount of setup, but I've used my previous WordPress setup in almost 9 years, so spending a day or two to get it right will hopefully be worth it. I had a look at a nanoc blog skeleton, but not only did it need tweaking to work properly with the latest version of nanoc, there was also too much magic going on. In the end, Dave Clark's guide to building a blog with nanoc got me started, and by making all the changes myself, I understood much better how the system is working.
The transition is not by any means "finished", there are many older blog posts that need cleaning up, the layout (based on Mark Reid's site) is nice, but needs polishing, etc. But I decided to eat my own dogfood and put it up. Already most posts should work, on the same URLs, and I'll slowly fix the rest.
PS: I used linkify extensively for this blog post, and I love it
March 29, 2013, [MD]
Last week I wrote about how I began resurrecting a three-year old time
tracker
project
with R graphs. I then added a
timeline,
and began thinking about other kinds of data that I could track.
It's now been almost two weeks since I began playing with this script,
and the first thing I can say is, I'm using it. Both because the graphs
generated are compelling, and because I've been able to add some useful
functionality, it's become a part of my workflow, and I've been quite
rigorous about logging my time, without ever feeling that it was
intrusive or annoying.
Automatically log on computer sleep or wakeup
One key feature I added, was the ability to detect when my computer goes
to sleep (by closing the lid) and when it wakes up, using
Sleepwatcher. It ends the current
activity on sleep, and automatically begins "surfing" when it wakes up -
I have to tell it if I am doing something more productive, but that
little popup saying "surfing" also reminds me of that. And being able to
quickly close the lid when my wife says dinner is ready, without
worrying about the time tracker running, and some category getting an
extra bunch of hours added on it, is also a relief. (I also changed the
format of the data files to nice time codes, like this:
2013-03-29 09:05:35 -0400,PhD offline
2013-03-29 09:42:35 -0400,surfing
2013-03-29 09:46:14 -0400,PhD offline
This makes it easier to manually edit (something I've almost never
needed).
Automatically turn off the Internet, based on activity
To provide more focus, I've also added a feature which turns off the
Internet if an activity containing the word offline is entered (for
example PhD offline). I just use ipfw for that,
def internet(status) # true = on, false = off
if status # enable
`ipfw -q flush`
else
`ipfw add deny all from any to any`
end
end
Of course, it would be easy for me to override this manually, but unlike
apps like Freedom, it doesn't aim to block me
from the Internet for a certain period of time, but only for however
long I'm using the offline activity. I can switch back to surfing at any
time, but I have to do so consciously, rather than mindlessly
Alt+Tabbing to a browser and pulling up Reddit whenever my brain
encounters something difficult.
Auto-completing activity chooser

I found that I often had to peek at the keyboard shortcut list before
switching to an activity, and sometimes needed activities that were not
on the list, so I added a command on Ctrl+Alt+Cmd+Enter, which pulls up
a text-entry box that autocompletes on all previously entered
activities. I was curious whether I would end up preferring this, or the
direct keyboard shortcuts. After a few days, I'm using both - I remember
a few keyboard shortcuts very well, like 9 for surfing, 6 for hacking, 7
for tasks, and 0 for rest (although I usually just close the laptop),
whereas for specific projects, it's quicker to pull up the window, hit
the first few buttons, like *la*, *laurie *pops up, and I hit enter to
select it.
Adding week view and traffic light
I've added a cumulative view over the last 7 days, and also a
rudimentary "traffic light" (thanks to
StackOverflow): green if I spend more than 4 hours per day on my PhD, yellow if more than 2
hours, and red if less than 2 hours. Looking at the graph below (click
on it to expand), you can see that I have not been spending enough time
on my PhD lately, and hopefully this "nudge" will help me improve on
that!

Future
The popup view is already getting very overcrowded, so if I want to do
any further analyses (which I surely do), I will have to start writing
up a knitr report, to be viewed in a browser. (My friend
Bodong suggested a Shiny
app, which I might also look into). I think
I will need to begin storing the data in a sqlite database, instead of
in flat textfiles, to enable easier integration with other data sources,
both automatic ones (Chrome history, Fitocracy API, PDFs read), and
self-logged variables. I also need to think about how much processing I
want to keep in Ruby, and how much to do in R.
I've got two 10-hour bus trips coming up next week, on the way to and
fromthe Coursera conference, so maybe
I'll get some more work done it then. In the meantime, I'm going to
focus on my PhD, and try to turn those traffic lights green!
March 28, 2013, [MD]

I've been writing more blog posts than usual lately, because of the
Beyond the PDF2 conference, as
well as some hacks I've been working
on,
and I realized (again) how much of my time is taken up with finding and
inserting links. I often have quite a lot of links in my blog posts, the
graph shows a bunch of posts with only two or three links, but most have
more than 10, and a few have up to 40 or 60 links
(these
two
are the winners, with 78 links each).
Typically what I do is Cmd+T for a new tab, type in a Google query,
select the new page, Cmd+L and Cmd+C to copy the URL, back to the blog
editor, insert link, etc. It's quite quick, but for 10, 20 or 40 links,
it takes a significant of time, and also disrupts the writing.
When you use the UI to add a link in WordPress, it automatically
suggests other blog posts, either by recency or by a search term. This
is great, but it only looks at blog posts - what if I want to link to my
YouTube video, or a wiki page? I'm also trying to (slowly) move away
from WordPress onto a static site
generator, probably
nanoc, which relies on editing MarkDown.

As I was looking into accessing the Google Chrome
history
for some quantified self
experiments, I realized that almost all the pages I link to are either
pages I've recently accessed in Google Chrome, or my own pages, either
from my blog, my wiki, or my YouTube, Vimeo or Slideshare channels. What
if I could quickly search those sources, and have the resulting link
inserted in MarkDown format? See the result in the short (2:30min)
screencast below:
The script is triggered by Keyboard
Maestro, grabs the currently
selected text, and looks up in a bunch of data sources (some, like the
YouTube, Vimeo and Slideshare channels are cached using the relevant
APIs, others like the Google Chrome history, and my wiki pages, are
live), and presents the choices using
Pashua. If I make a choice,
it then formats the link accurately depending on which application I am
using, and in Google Chrome, the URL of the tab I am on (wiki markup for
my wiki, Markdown on GitHub, etc).
The source is on GitHub, and it
should be fairly easy to get running, especially if you only want Google
Chrome history. There might be some individual quirks in how I access my
wiki pages for example, but feel free to contact me if you have
questions.
Stian
PS: The other thing I spend a lot of time doing, is selecting,
resizing, uploading and inserting pictures, and that's another thing I
hope to simplify when I move to writing my blog posts in Markdown.
March 24, 2013, [MD]
At Beyond the PDF 2, I gave a
Vision talk about "An open alternative to Google
Scholar", and
since then a group of us have begun discussing how we can make this
happen. There isn't yet a fixed place for this discussion to take place,
but we can use the broader hashtag #scholrev (see Peter Murray-Rust's
blog
post)
to coordinate.
Researchr and "Scrobblr"
Many of my thoughts on this topic came out of my work with
Researchr, and my wish to
have a system with an open API, which would let me integrate search,
metadata lookup, etc, with Researchr. I also wanted unique IDs for
publications to be able to link my notes about an article, with notes
somebody else took about the same article. Together with Ryan
Muller, we began work on
"Scrobblr", a social hub for
reading. The idea was that it could work like Scrobbler for music, where
what you listen to is automatically submitted, and shared with your
friends. In the same way, the papers you read are automatically
submitted, and other people in your group can see what you are reading,
and automatically import your citations (screencast
demo).

Although we never got there, we thought a lot about how this could be
expanded to a much larger social hub - sharing bibliography lists from
different Researchr users, auto-suggesting "you've been reading many of
these papers lately, you should get in touch with this other student,
who is reading a lot of similar stuff", automatic PDF hash-based lookups
(screencast demo), etc. I
began writing up design ideas in a document that was never
finished, but many of
these are relevant to the current ideas about an Open Scholar Search, so
I'll post some of them here.
Unique IDs for publications
There
are a lot of reasons why we'd want unique IDs for publications, making
citation lists unambiguous and easier to parse, enabling rich citations
in non-traditional media (wikis, blogs), etc. Right now CrossRef
DOIs is the closest we come, and
they already show how difficult it is to push for the usage of such
identifiers (ORCID will have a similar challenge).
It would be great if it were possible to build on the work CrossRef has
done, and I've recently become aware of how much interesting innovations
the team is coming out with
(slides,
blog).
However, there are a few barriers. The first is economic - as far as I
can see, it costs a minimum of \$330 for a publisher to participate.
This might not seem like much, but I know of very few independent OA
journals that have DOIs. (There might also be large technical
implementation costs, I don't know). However, worse than this is that
only the publisher can submit metadata. This means that we have to rely
on them to submit correct metadata (and although that's often the case,
it's not always). It also means that we will never get
metadata/identifiers for publishers who don't participate, who don't
even exist anymore, or for scholarly material that wasn't published as
journal articles (we might want to cite video films, archive items etc,
and have unique identifiers for them as well).

Below I discuss how the identifier might be formatted (from Scrobblr
notes). This is also
related to who can assign an identifier, in the case of CrossRef DOI,
identifiers are assigned by
publishers, who
get their own "name spaces" (similar to ISBN, DNS or IP numbers). In the
case of ORCID, who share the their deliberation about
identifiers,
numbers are assigned centrally, and are simply arbitrary numbers with a
specific formatting. This will probably end up being the case with
articles in open scholar search as well, but below I play with the idea
of using something more semantic - after all, it's a lot easier to give
a hat tip to @houshuang than
to http://orcid.org/0000-0002-2632-8448,
even though both are equally unique. And it is a fascinating idea to be
able to write
[@scardamalia2006knowledge] in
any blog or wiki, and have it work...
Unique IDs
(*From "Ideas for
Scrobblr":)*
Each publication should be assigned a unique ID (UID). This is inspired
by the integration of many different applications that is enabled by the
concept of a citekey in BibTeX. APIs should enable users to submit UID
and receive metadata for any publication (whether in JSON or BibTeX,
whether strictly citation info or also social info about tags, other
users, links etc). There should also be a number of ways to determine a
publication’s UID through various lookups.
Format
There are (roughly) two choices for the format of a UID. The first would
be a randomly generated (or sequential) ID with no semantic meaning,
whether with numbers or letters etc. The second would be the citekey
format which researchr currently uses. The advantage with this is that
it is familiar to users (of LaTeX / researchr etc), and immediately
conveys some minimal information about a citation. Through use, certain
frequent citations might even be recalled actively or passively.
Certainly, it is much easier to reorder three publications cited in a
blog post using citekeys ("I’ll put the scardamalia2006knowledge first,
and then mention johsnson2000corruption") than using random IDs ("See
for
example 3093049, 304955 and 88585").
However, there are a few challenges with using the citekey format. The
first is generation and the second is collisions. Although the general
principle is well understood (last name of first author + year + first
word of title) there are a number of permutations, for example
- I prefer manually changing van2006knowledge to
vanderwende2006knowledge
- what to do with punctuation, is it peter2006knowledge or
peter2006knowledge-integration
- it often makes sense to include the first word with more than n
(=3?) letters, etc.
This results in citekeys generated by researchr or other tools (Google
Scholar) and Scrobblr to be different. Some of these we can just define
arbitrarily, but we might want some decent algorithm to solve the first
point above - perhaps joining the words of the last name without spaces.

Given that we can thus generate nice citekeys from submitted metadata
(much of which won’t even have a citekey, or have a citekey in a totally
different format), we encounter the problem that the citekey in the
database might differ from the citekey in the user’s local system. One
approach would be to use Researchr or other plugins to “harmonize” these
(i.e.. automatically modify citekeys on the user’s end) - this would
have to be done early in the import process, because everything locally
is tied to the citekey (PDF name, wiki pages). (Of course, in the future
Scrobblr will be the first place we go to download papers in our fields
anyway so theoretically we won’t even have this problem :)) Or we could
just accept that there will be a discrepancy here.
The second problem however will be collision. It is likely that there
will be cases of several papers generating the same citekey. Again we’ll
need a way of resolving this. A simple way would be to add “b” to the
year or something like that - not very elegant, since it will look kind
of “random” when viewing it outside of a context. Another approach could
have been to go back and give both articles a longer citekey to avoid
collision (perhaps the first two words of the title), however, given
that a citekey once assigned should be absolute, this is impossible.
Given that we can solve all of these things, the final concern is user
confusion about local citekeys and Scrobblr citekeys, given that they
look so similar. One way to mitigate this in practice would be to come
up with some notation for linking to citekeys which specified that they
were Scrobblr citekeys. Currently we are using [@citekey] for citations,
but this is purely random, it could easily be something else. It would
however be great if it was something both easy to type, easy on the
eyes, and still fairly unambiguous.
Since citekey is
rarely used on the web today, it would for example be easy to write a
plugin that scanned a blog post for this notation and recognized
citations.
March 22, 2013, [MD]
I had an amazing time at Beyond the PDF
2 in Amsterdam (March 19-20). I
met so many of the people whose blogs I've been following, did a demo of
my open academic workflow,
gave a 3 minute pitch for "Why we need an open alternative to Google
Scholar", for
which I won a shared second place, and came back filled with ideas and
connections. I hope to write a series of blog posts highlighting some of
the things I saw and heard at the conference, but I thought I'd start by
posting an archive of my tweets.
I've done this a few times before, from Critical Point of
View
in Bangalore, Learning Analytics
Conference (and
pre-conference)
in Banff, and
OAI6
in Geneva, and even created a
tutorial
on how to generate this list using TextMate. This time, I grabbed the
archive from Bodong's Twitter analytics
app, and used R to select only my tweets. I
cleaned them up a bit, removed some purely housekeeping ones, and added
a bunch of links for context.
This archive will also be useful for myself in reminding me what we
talked about, things to follow up on, etc.
Before the conference
- Amsterdam beer and talking semantic publications with @pixievondust
and @jschneider, great opening to
#btpdf2! Looking forward
till tmrw
Day 1
Before lunch
- RT @kaythaney: .@pgroth kicks off with a challenge: what would you
do with \$1K today to make research communication
better
(note ...
- Twitter ecosystem slowly being suffocated, and I find myself using
it less and less, but for conferences still where it's at.
- Kathleen Fitzpatrick opening. Notes from her talk "Peer-to-Peer
Review and Networked Scholarly
Communication"
- @jeroenbosman Are any of these really suited for today? Why so much
focus on italics etc when we can use URLs, DOIs etc? (genuine q)
- Fitzpatrick's mom when publishers couldn't publish her monograph
bec. of commercial concerns. "They were expecting to make money?"
- "You could just publish your entire monograph online, with reviewer
comments - but I know that's not realistic"... @kfitz: "Why not?"
- Anyone researching the future of scholarly meetings? Been to some
innovative confs, but would love to see research, documentation
- "learning at conferences" should be a field of study (want 2 see
learning scientists use learning theories to rsrch rsrch)
- @rmounce Remember that "theoretically" many may read the monograph
at a library. Many journals with only 400 subscribers too.
- @petermurrayrust What's your definition of citizen hackers?
- Give grad students researching in other countries grants to help
translate
thesis
#1k
- Re 400 monographs sold, my MA thesis was downloaded 1600
times,
so I guess I'm doing OK :)
- Would be great if you could do some data viz/analysis of
tweets, @bodongchen like
- #MOOCs and - Indig
Educ@OISE
- 22,000 students using OA papers in their learning. Powerful
argument for #OA
- @rubp Cool, would love to come. I've also been playing with open
scholarly workflow - lit review
etc
- Wish name tags were bigger - know so many people here but not
faces...
- .@edsu Giving a vision talk "Why we need an open alternative to
Google
Scholar"
tomorrow. Happy to discuss!
- .@AubreyMcFato @edsu @doajplus Yeah, one of a number of projects
that it would be great to build on - so much unexplored potential
- @edsu Journals should ping a server w/bib* metadata, like blogs.
Standardized micro data. Importing existing databases. CommonCrawl
- Love idea of reproducible research, see these cool showcases of
IPYnbs
and R+knitr
- .@jschneider @aubreymcfato @edsu @doajplus Breakout on
OpenScholarSearch - let's make it happen, when/where?
- Compare diff aka search sites w my name
BASE,
GScholar,
MS Academic
Search.
GScholar wins...
- Wrote up a bunch of unfinished detailed ideas for a social reading
hub, much applies to
#OpenScholarSearch
- @neuro_cloud Not sure if Reddit is the best tool for this, but
doesn't hurt to try.
- Funny disembodied conference, saw two old friends tweeting with,
excited to meet them in person - both are not here :)
- For people who see #RDALaunch hashtag and want to know what it's
about, more
info
- @axfelix There will be a lot of new innovation in RSS readers in the
next 6 months... Is GSch similarly stifling innovation? #goodenuf
- @Protohedgehog They also funded 33 students to attend, pretty
amazing. Let them try to keep up as we try to out-innovate them.
:)
- .@axfelix GScholar has no API, and there will not be
one - pretty crucial.
- @axfelix Is that an argument for or against? I also want to interact
with aka search, don't have lot's of ppl to de-engineer...
- @axfelix I think (hope) there are ppl attending who do believe in
overthrowing global capitalism (or GSch)... we can build sth better!
- @Dreusicke I like being able to download data and display/process
how I want. Only on the web like DRM - could disappear tomorrow.
- Backchannel has been great, but I'm worried that it will fade after
lunch as everyone start running out of juice... #no-power-plugs
- @axfelix Makes me think of all the congratulatory tech news stories
about use of Linux in China. Reality is almost Win-monoculture
- @juancommander glob south, I'd love to see issue of language raised
as well, maybe more central to socsci/humanities than hard sci?
- @axfelix I think bandwidth and lack of ubiquitous/always-on access
is much bigger problem (and maybe access to devices/mobile)
After lunch
- Journal of open research
software sounds great.
- Online journals and faster horses
- RT @jschneider: Brian Hole: "Articles are so '60's" -- with image of
Philosophical Transactions 1665
- RT @juancommander: @jasonpriem doesn't mind the little time because
he talks twice as fast, he gets double the words per minute
- @juancommander looking forward to my 3 min vision talk tmrw
#talk-fast
- I'm so used to watching Coursera lectures at 2x speed so Jason was
like slowmo for me ;) #going-going-gone
- RT @CameronNeylon: 11-16 million hours spent each year on reviewing
papers that get rejected just in WoS..that's around 1000 years ...
- Kaveh on copy editing "we read it many times so you only need to
read it once"
- Kaveh: one very complex page of journal w math can take 2000 lines
of XML
- RT @rmounce: The blog post should be the 'Version of Record'. Dump
XML, make HTML-5 the base VoR standard @kaveh1000
- Interesting point by Murray Rust regarding the need for typesetters,
with good incentives, people can do it themselves
- This panel is highlight so far, so many neat projects! How can
people get involved,
contribute?
- I'll be graduating in 2 years and looking for a job. A job board for
people who care abt sci2.0 and openness?
- Funny w where presenters can't do live demos, presenter suggests
beyond the PowerPoint conference ;)
- @juancommander how about independent open access journals w no
budgets?
- @juancommander many journals say they'll waive them for those who
can't pay. Not sure how works in practice
- @rmounce we're doing institutional research on that at
utoronto. And I think many other
institutions are as well
- #Peter Murray Rust talks about the "scholarly poor" - non
academics, SMEs, etc. Making the case for research funding.
- @kelli_barr not just access to Western rsrch, when doing rsrch on
Indo libr, easier to find Western than loc rsrch, not digital
- @kaythaney @neuro_cloud which includes language
- RT @kaythaney: I love that "we all know we're moving towards Open
Access" is an understood, throwaway comment in this crowd. #oa
- Anyone going out for beer and more food after this? Or everyone
jetlagged and tired? :)
Day 2
Before lunch
- Academics reading almost twice as many articles per year as in 1977.
Huge increase with e-access, beginning to level off. Tenopir
(report)
- RT @anitawaard: 80% increase in nr of article readings; 30 %
descrease in time per article read - unsustainable trend! Tenopir
- I must've shown my scholarly
wiki to a lot of people
yesterday, given how many pages I had to remove this AM
- Insightful, how do we address this? RT @mfenner The Price of
Innovation - my Thoughts for Beyond the
PDF
- RT @petermurrayrust: Almost no mention of #scholarlypoor - OUTSIDE
academia. Usually called "consumers". T ...
- RT @petermurrayrust:. Anyone present want to challenge the system
and create a bottom up OPEN infrastructure for #scholarly comm ...
- Excited about "Making it happen" session - although shouldn't that
be the two day conference, rather than 1.5 h? :)
- @jschneider @tac_niso Also wonder about other submission formats -
anyone accept papers in Markdown?
- Haven't talked much about authoring tools/workflows here. Anyone
excited about scholarly
Markdown?
@mfenner
- Wish we had more time for breakout sessions (more sessions). Several
things I want to talk to people about
- Reuse of scholarly slides at #btpdf2, very cutting edge :)
- RT @CameronNeylon: I want to lock all the developer groups in a room
until we agree a path to an interoperating ecosystem. Then ...
- @CameronNeylon Is there an empirical study of how often locking
people in rooms lead to solutions? It seems like a common idea! :)
- 3 min vision talks are the
last session today. Maybe it should be the first, so we could spend
day planning/implementing? #btpdf3
- @CameronNeylon So we just need 500 people to tweet with #1k
hashtag. #we-can-do-it
- @kerim @ilya @mfenner @criticmarkup Very
neat, how can we push this forward? I'd like to be involved.
#lets-make-it-happen
- @InfoFuturesNYU @datadryad Eating dogfood, brilliant. Data about
data sharing by scientists, openly available.
- Hackfest to create tools/workflows/documentation on using Scholarly
Markdown+Git for academic authoring/collab #1k @CameronNeylon
- @TAC_NISO @ianmulvany @cameronneylon And all ideas look more
brilliant when sketched on napkins! Need napkin.js
#coffe-stain.js
- Would have loved to see @xieyihui from
knitr and @fperez_org from IPython
Notebook at this conf, executable
docs->beyond PDF
- Graeme Hirst: Post-modernists are high-verbiage, zero logic. Vs
people who only talk in LaTeX-math mode. Great talk! @graemehirst
- @petermurrayrust @okfn Would have loved to join OKFN
conf/hackfest.
Always tricky to get funding. (I'm ironically here because of
Elsevier)
- RT @petermurrayrust: We shouldn't spend so much time talking, we
should be doing and creating
- @pgroth @petermurrayrust People travel across the world anyway, have
an extra day of "doing"... just need a location with wifi+power
- RT @anitawaard: .@erwinverb Tools
- @cgueret There was a call to design it, don't know how far they got.
CriticMarkup interesting for collab
- @utopiadocs How I embed Skim PDF
rdr in my workflow-love
highlight, export clips, and AppleScript support.
- Five min flash talks - nice warm up for 3 min vision talks later
today. Great with pitches, call for action
- @anitawaard` is amazing
like always - great example of how lab research is actually
documented in US labs
- RT @jschneider Sensemaking involves collage-based manipulation of
electronic, born-digital materials, printed and annotated on paper.
@anitadewaard
- Graft tools closely on scientists' daily practice. (Why
"anthropological studies" of researchers important,
Tenopir
etc) @anitawaard
- Love the X-Files theme when speakers go over their time
- Love the work @swcarpentry do
teaching sw devel practices to scientists. Who're teaching them new
research workflows/tools/publ? #1k
- @researchremix Will need more than #1k, but brilliant idea. YC and
HN are big inspirations
#incubator-for-schol-comm-startups
- RT @researchremix: #1k A YC/techstars incubator for scholarly
communication startups. Mentorship, leg up in biz, marketing, fun
...
- @researchremix Does it also require more work on funding models? Can
we pitch VCs on funding projects with open source/OA etc?
- PKP XML tool looks awesome, was shocked
when I first realised OJS didn't have document pipeline, very happy
to see this! @axfelix
- Out of the box NLM-compatible styles is huge, the PKP doing this is
huge.
- @phillord You can generate HTML, PDF etc from NLM XML. Semantic vs
presentation format.
- Great that @rmounce is talking about PDF
metadata,
I've been frustrated over this for a long time! Let's go for
#low-hanging fruit!
- Everyone should read this: Why can't I manage academic papers like
MP3s? #pdfmetadata
- Yes, I want publications tagged with whether they are OA, license...
Only on the web, but after I downloaded, how do I know? #rmounce
- talk on #ORCID, excited about potential,
looking forward to seeing them more in the wild
- "There is no problem if you just give me more money and use my tool"
... so true :)
- @pixievondust What does use constitute? I have one, and I'd love to
tag my pubs etc, but not sure of journals who support that etc.
- The revolution will not be
peer-reviewed
#scholrev
- RT @conjugateprior: What is Word doing in #Btpdf2 workflows? Like a
family gathering where everyone tries to ignore your psychopathic un
...
- RT @petermurrayrust: ~25 revolutionaries met at lunch . Will
coordinate under hashtag #scholrev (tag seems to be fairly free)
- RT @maurice_: #scholrev find the low hanging fruit AND put up an
inspiring vision far beyond the PDF
- @rmounce @gbilder Nice, seen some impressive stuff from CrossRef
lately - need to look more
into! #pdfmetadata
- RT @openscience: Lovely: @bodongchen's #Shiny app for tweet
analytics
- Most active tweeters at @houshuang, @pgroth, @rmounce and @kaythaney
- RT @GullyAPCBurns: how could knowledge engineering researchers work
with publishers to be more effectively as a community to inn ...
- "Universities are big bags of researchers who have a shared need for
car parking"!
- Crowdsourcing vision talk on Alternative to Google Scholar: talking
notes here, help me
edit
#scholrevo
- @houshuang 28 concurrent viewers in Vision talk document, I love you
all, you're crazy! :) #scholrevo (leave names and I'll cite u)
- Lot's of great ideas at Vision talk "Open alternative to Google
Scholar" doc, thanks all! #scholrev
- RT @neuro_cloud: @battagliaem I would like to see more
breakout/work groups. So many ideas need to translate to works!
#BTPDF2 #BTPDF3 ...
- How about recognition for activists - many work hard on deep ideas,
but difficult to publish them, #should-be-working-on-my-phd
- @ianmulvany PS: loved blog post about
Encode.
Used it in 2 presentations about OA
- @ambrouk I had the idea of a "fair trade" logo for publications 5
yrs
ago,
includes #OA, and translation
After lunch
- @mfenner wish it was more extensible
(Pandoc), I'd like the
citations to have links to my wiki - plugins, filters without
learning Haskell
- RT @dshotton: To escape ISI, open your citation data! See Open
Letter to
Publishers
- Giving public talk about ≤a
href=https://plus.google.com/113732143584807227124/posts/82FiF4gr2U5\>Open
Science at KU Leuven tomorrow #oa
- The Google Doc from my
talk
will transition to a strategy/planning document, add name&ideas if
interested
- Just gave talk abt open scholar search, what a rush. The
conversation
continues
- @mfenner it's hard to be subtle in 3 mins but I'm actually all abt
coordination as opposed to reinvention
- @mfenner again, let's do it. I can't do even a fraction by myself,
eager to contribute to projects that can.
- @phillord my ideal is citations just list of unique identifiers.
User/app can render any way they want.
- @lukask @phillord any journal accepting markdown+btex file?
publishing these alongside HTML etc? Maybe modify Jekyll for jrnl
only
- @mfenner @lukask @phillord happy to. Know of any
bibjson ruby libraries? otherwise we
should make it
- @axfelix does it accept markdown as well? Is this released in
OJS or when coming? Sounds awesome and
overdue #ojs
- #Kaveh: publishers, give us XML. Yes! Publishers: format is not
your business. Want to read in my house style. Yes!
- #ah the perils of live demos on others' computers, very brave!
- #very impressive by Kaveh! Why we need d/l-able files, control to
readers not platforms.
- Kaveh: It's not for sale, but you can get it at a very good price.
- Publishers whatever, how can I get my PhD thesis into XLM w/I ppl in
Kerala?
- RT @memartone: Feel free to propose that we all use ORCID ID as part
of the FORCE11 pledge. If the issue has been settled, let ...
- RT @axfelix: I think it's quite telling that @kaveh1000 is killing
it at with -- what's this? -- a PDF generation pipeline. beca ...
- RT @mfenner: Popular vote for best idea goes to Carol Goble,
@houshuang and @kaveh1000 They also get my vote, cool ideas, and w
...
- @rguha not sure I agree. A lot of my friends happy editing wikis,
short step to Markdown, huge step to latex.
- Looking forward to videos becoming available from - stream is down.
Some things so good I want to see again / share.
- RT @TAC_NISO: Transcribed version (with draft intro from me) of
Amsterdam Manifesto for Data
Citation.
Thoughts please ...
Stian
March 18, 2013, [MD]
Yesterday, I wrote about my tiny timetracker
script,
resurrecting some 3 year old
code, cleaning it
up a bit and adding a simple R graph of my day. The script makes it very
easy to track intention (ie. I am the one saying what I am working on,
it doesn't try to infer it from my activity), and over time the log
files should prove interesting.
R graphs
I started wondering about other ways of representing the data with R
graphs. Right now, it's just showing a simple bar graph with the
cumulative amount of time spent on each category per day. It would be
easy enough to make similar graphs per week, month, etc, and also easy
enough to correlate other measures that I tracked per day (temperature,
time getting up, mood etc) with cumulative activity in each category for
each day (ie. on days when I got up early, I got more hours of PhD
reading done, etc).
However, the log files don't only contain information about how many
hours I spent each day doing different categories, they also contain
information about when I start and stop different activities. So I might
be able to find correlations like "I tend to get more done on my PhD on
days when that's the first thing I do", etc. To begin with, I tried to
find a way to graph the day's time use as a timeline.

Categories
There are still some challenges with the script. The first is how I log
categories, right now I have 10 slots (0-9), but since I log the full
text, rather than the number, you can change the categories in
settings.rb, without risking to "overwrite" earlier logs. However, I
realized that I wanted to log at different levels of granularity. For
example, I might want to know how much time I'm spending preparing for a
presentation in a few days, but I'd also like to know how much time I
spend each month preparing for presentation, or even on "schoolwork" in
total.
I could attach categories to the projects in settings.rb of course, that
would be easy. I would have to determine whether I wanted the categories
to be exclusive or not. If they are exclusive, I can add them all up,
and get the total amount of time spent. If I want overlapping categories
(presentation is both school work and authoring, whereas writing a blog
post is authoring, but not school work), I'll be able to look at time
use in different categories, but can't compare them against each other
(plotting authoring vs school work wouldn't make sense, since the time
spent writing the blog post would be double-counted). I guess an expense
tracking system that let's you tag your expenses in different categories
has the same problem.
One problem is that I don't quite know how to store or represent this
information effectively in R. I had the same problem when I imported
Google Analytics data together with metadata about all of my blog posts.
My blog posts usually have several categories attached to them.
Initially, this is just a text field with each category listed like
"oa,publishing,china". How would I represent this in a datastructure in
R, so that I could see whether certain tags were more popular than
others, for example? Would I have to duplicate the post, so that I had
three entries for the page, one for each tag? Or turn the tags into
binary variables, so that for each row I would have columns for all the
tags I've ever employed, with a 1 for in use, and a 0 for no? (And is
there a function to remap the data like this?)
Other data
I also thought about other sources of data that I could track either
explicitly or automatically. Some of these would be interesting to track
and visualize by themselves, others would be interesting mainly as
related variables. I could for example easily track all the scholarly
PDFs that I read, by taking note of when clippings are exported to
Researchr (I could log both
the number of PDFs read, and the number of pages in each PDF). I could
also look at the length of the high-level notes that I write about
different articles.
It would be quite interesting to wear a FitBit or something similar
24/7, and get detailed information about when you fall asleep, when you
wake up, how you move around etc. However, I could at least use
Fitocracy's API – if I could query the number of points added per day,
that might be a useful proxy for exercise. (If I am diligent about
turning on Runkeeper when biking, I could also extract the number of
kilometers biked every day).
There are some things that I do digitally, that would be so easy and so
interesting to track, but which does not have an interface. I spend
hours every day reading on my Kindle, and it would be very interesting
to export the number of pages read per day, the time I've spent, speed
(seconds per page), etc. But the Kindle does not collect this data (or
at least, it won't share it with me).
Entering manually
There is also data other than time-use that I might have to enter
manually. I thought about creating a very unobtrusive interface,
triggered with a global keyboard shortcut, which would let me type in a
variable (with autocomplete), let me tab to an entry field, let me type
in the value, and press enter to store (with a time stamp). This could
be everything from weight, to bed time, books read, or anything else.
(One could even imagine a window that pops up at random times asking
about your mood, whether you are feeling tired or energetic etc - but
that might quickly become annoying). First draft of interface:

Reports
Right now I am creating a few graphs with
ggplot2, running an R script through Rscript,
that spits out a PDF, and then I display that PDF with
Pashua. When I have more data,
and graphs, I plan to create a knitr template (Markdown + R code), maybe
even using a templating system, and then run knitr from the command line
(through Rscript?), which will generate an HTML page, which I can then
open in the browser.
Anyway, that's how far I got in my pondering.
PS: This blog entry took exactly 37 minutes to write, most of which I
did on the plane, which is the early pink blob you see on the timeline,
then my battery ran out, I arrived, spent some time finding my AirBnB
host, etc, and then the timeline resumes :)
March 16, 2013, [MD]
About three years ago, I read some articles about the quantified
self, and how the simple act of observing
something can lead to change (often in a positive direction). I've been
interested in productivity tools and theories for a long time (it's a
constant struggle for academics), and I thought of different ways of
measuring how I spend my time. I tried a few different automatic tools
which look at which applications are open, which websites you visit etc,
but found that the data they generated were not that helpful. If I am on
Google Scholar, am I doing research for my PhD, working on a paid
research project, or just following a random thought?
So I needed something that took my intention into consideration, but did
so in a really easy and unobtrusive way. I had a pretty good idea of how
the tool I wanted would look - something that would sit in the menubar,
and where I could change which activity I was working on with only a
global shortcut. I looked around, but couldn't find any tools that
really fit the description, so I began building my own. I wrote some
really simple Ruby scripts to log time codes to text file, triggered
with a global shortcut program, and used Growl to
provide some feedback.
I wrote up the whole
thing
on my blog, posted the
code on GitHub, but
actually didn't end up using the system very much (the fate of many
productivity tracking systems, I'm sure). Three years later, I've spent
a lot of time working on my open academic
workflow, and I've also
begun experimenting with R for data
analysis and visualization. I am also involved in a number of different
paid projects, so tracking my time is not just for self-insight, but
would also be very useful for billing, etc.

I opened the code that I hadn't touched in three years, updated it a
tiny bit (I use Keyboard Maestro
now, instead of FreeHotKeys),
and then experimented with adding a graph. It took a bit of time getting
R to play nicely with Ruby, I began with
rinruby, which
lets you run R commands through Ruby. However, this popped up a Quartz
screen every time I used ggplot to render a graph (even if I never
displayed the graph, but sent it straight to a PDF).
Then I tried to run an R script through R CMD BATCH, which worked, but
took almost 10 seconds to execute. I later found out that this is an old
way of doing things, and that
Rscript
is the new way. That worked perfectly, and it executes and renders the
PDF in 0.8 seconds. I then use
Pashua, which I use extensively
in my open academic workflow, to display a dialogue with the graph and
some extra information.

Currently, it just shows a simple bar graph of activity during the
current day, but as I collect more information over multiple days, the
data could be visualized in many interesting ways. I know not only how
much time I spend on a certain activity each day, but also when I spend
the time (and in how large chunks, how often I'm interrupted or start
surfing etc). This could be visualized as time-series, and I could even
experiment with correlations with other factors, whether external ones
(the daily temperature?) or internal, if I track other factors (when I
go to bed, what I eat etc).
Only time will tell if I keep using the system, but perhaps this
possibility of using and visualizing the data will be enough incentive
to track. It will also be very interesting to see how much time I
actually use on various activities - for example I need to give a
presentation at Beyond the PDF2
in Amsterdam in a few days - exactly how much time will it take me to
prepare?
Stian
February 23, 2013, [MD]
I was asked last minute to fill in for Nick
Shockey of The
Right to Research Coalition to give a talk about Open Access at
IgniteAlberta in Edmonton. I spoke at a
session about Open Access and OER, shared with Cable
Green from Creative
Commons who participated remotely.
The conference was put together by the three student associations in
Alberta to bring together student leaders, faculty, administrators and
people from the province to discuss the future of Alberta's higher
education system. The sessions were a mix of large plenary presentations
and smaller break-outs where everyone were seated around small round
tables, and were much of the time was spent discussing in groups, and
then summarizing back to the larger group. I really liked this way of
organizing, and learnt a lot from the professors and students that I sat
next to.
I also realized how little I know about the higher education context
outside of Ontario. In Toronto, it's easy to assume that Alberta has
lot's of money and few problems, but I heard about provincial cuts, and
also the challenge of low high school and post-secondary completion
rates.
I've given many
talks about
Open Access, and when I am asked to speak, I usually remix slides from
older slide decks, but it's always a challenge to organize it in a few
that will make sense to the audience, fit within the timeframe etc. This
time, I only had about 20 minutes, and I wanted to convey both a basic
understanding of what Open Access is, and some of the excitement that I
feel for it. I came up with a basic framework of "what, why, how", and
think I was able to cover a lot of basics in the 20+ minutes, together
with some neat examples, and updated news, such as
(slides)
Stian