Notes from Learning Analytics Conference 2011: Day 1

February 28, 2011, [MD]

Today was the first day of the conference, with a lot of very interesting sessions. Too much to process at once, but hopefully these notes will be useful. They were taken in Etherpad, and others, especially Andrew Barras, helped out. Doug Clow also took very extensive notes from the sessions (morning, afternoon). More below the fold.\

Feel free to jump in and help us take notes anyone. Link to pre-conference notes: (and archived here:

This was mentioned by someone: Python tool for textual analysis

You can download all of StackOverflow data for analysis:


Tony Hirst: Pragmatic analytics: insight, representation and structure

"Scoring Points" - book about about how Tesco used loyalty program to collect data about shoppers, and offer better services to shoppers (

Segmentation - different groups of shoppers. 

JISC - business intelligence (

Marketing companies already know huge amounts about you - deliverable at post code, or address level. Difference is that now you can get access to data without paying huge sums (through social networking analysis etc).

After graduation, we should engage learners as life-long learners, and offer subscription services - we already know a lot about them. Not just put them in the "advancement/fund raising" bucket. 

Course choice analytics. \ Two years ago, Google was the dominant way for people to find OU courses - now Facebook is becoming increasingly important. 

How does this work with OER discovery? How are people finding your OER? What kinds of descriptions are you using. Are you describing the course with language that could only be understood by people who have already completed the course? :) What search terms are people using to find your course?

Descriptive reports\ Prescriptive models (common sense model of how people behave)\ Predictive voodoo (you don't know what's going on inside)

Library\ Dave Pattern (@daveyp) - added "people who borrowed this book also borrowed this other book" into library catalogue. Clear stats: increase in books borrowed. \ Also looked at whether engagement with library improved people's degree qualifications. Correlation between use of library and qualification.

"Negative feedback, closed loop control system"

If you make changes, and there is no measurable change in output, how do you know your change had any effect?

Using Google Analytics to analyze online course, block by week. Extract data from GA, build a model, find unique visitors to different resources, get a better feel for how people are moving through the course. Then you can experiment with for example moving course assessment, see whether weeks are overloaded or underloaded.

Time Series Data demonstrates\ -Trend\ -Seasonality\ -Noise

The concept of "detrending" data... To be able to get to periodicity. 

Fourier analysis - any time series signal can be made from combinations of sinusoidal curves. Segment time series data into different periodicities. 

books: O'Reilly\ Collective Intelligence\ Visualizing Data\ Data Analysis

People who use Google Analytics often just view the default reports. Do you try to relate the behaviour from GA to behaviour reported in other systems (virtual learning environments with their own reporting, etc - or is this data produced in "silos" and looked at by different people?) 

Looking at the behaviour of course websites AS websites - how effective are they AS websites? Are they visiting pages, are they clicking on links. Not necessarily to gauge student learning.

Must be careful using Google Analytics. The graphs can be misleading\ Always be suspicious of means. Also look at bin sizes.

Anascombe's Quartet

Simpson's paradox.'s_paradox

Segmentation is critical. \ Facebook Course profiles. Students can provide info about courses they are taking

Using tags on Twitter to visualize networks, create animations over time. \ Get a pretty good way of the structure of the students, and their social networks, and how we can communicate with them. 

Data can make people uncomfortable (and close-up videos of eyes can do this as well)

David Wiley - BYU - Learning analytics as Interpretive Practice

"Warning voice". Download slides:

Interpretation != science?

Confusion of science with positivism\ Social scientists have "physics envy", quant > qual

Educational measurement: what does he/she know?\ Research that is mediated by observation - can't crack open George's head (and wouldn't want to if I could)... People engage in behaviours, and we take those behaviours.\ Online learning is even worse - can't look at George to see if he is paying attention. Second layer of abstraction. 

All observable behaviour online is expressed in this very restricted vocabulary to key presses and mouse clicks. Two layers removed.

Westerman's argument: Quantitative inquiry is interpretive. (*ob=ArticleURL&*udi=B6VD4-4MD9G50-2&*user=10&*coverDate=12%2F31%2F2006&*rdoc=1&*fmt=high&*orig=gateway&*origin=gateway&*sort=d&*docanchor=&view=c&*searchStrId=1658861033&*rerunOrigin=google&*acct=C000050221&*version=1&*urlVersion=0&*userid=10&md5=02dcae8f717baa616689b174b85b3fe6&searchtype=a))

Construct operational definition of "they're happy", "they know calculus" - an operational definition is like a diving mask - you can't see anything else. 

Calculating time on task online - we have now clue of whether people are even looking at the screen. We have a "common sense" idea about time on task, dangerous.

"Letting the data tail wag the theory dog" (Vic Bunderson)

Can we call it success, if we can predict, but we don't understand why? 

If not positivism... then what? \ Hermeneutics - meaning and interpretation

Problems with metaphor - information processing model - only works when the brain is operating like a computer. Breaks down when creativity is involved

Reductionism - "nothing more to be said when neurophysiology has had it's say"

Behaviour in context, social practice. How do we observe behaviour in online environments? 

Structural equation modeling\ Multilevel data structures\ Continous, longitudinal measures

Tasks nested within practice

Learning analytics is an ethical activity - what happens if people actually follow the recommendations we make? 

Stephen Fancsali - Variable Construction for PRedictive and Causal Modeling of Online Education Data (Dept of Philosophy, CMU also with Apollo Group)

We only have access to complex, raw, log data

Predictors vs causes - predictors of learning outcomes may be useful for "diagnostic" purposes, but need not be *causally*related to outcomes

Difference between diagnostic and causal

Predictive analytics - identify high-performing students\ If we are interested in changing products to change (enhance) student performance, we nee dto know causes of student learning outcomes (causal knowledge)

Causal graphs. Lot's of work has been done on this during last 20-30 years. Focused on data provided at appropriate level/unit of analysis. How do we deal with log data etc.


rely on intuition/expert opinion to construct ad hoc variable (detailed research in conference paper)


devise a data-driven search for variable construction

Data from grad level econ course - data from messaging and access to "resource" (chapter)

variables:\ student public and group forum message count\ intructor private forum message count\ chapter "view" count

learning outcomes:\ final exam score (independently graded by textbook exam)\ course grade (grade points out of 4.0)

excluded demographic variables because interested in actionable interventions

search for variable construction

START assembled data per student -> operators -> prune -> causal graph search -> causal predictive modeling -> assess: prune or STOP -> operators (start over)

operators: sum, max, var, per day, etc. logarithm, discretize, interactions.

if two operators are highly correlated (for example mean and median), take the one that is the most predictive. 

determine sets of causal graphs that could explain data (use method that allows for unobserved common causes)

"average causal predictability"

aboveMedian(log(min(message world length))) - for example - causal discovery algorithms implemented in OSS Tetrad project

Ari Bader-Natal (Grockit)

What are people doing. What's effective/engaging? How can we do analysis effectively? What do we do about data that doesn't leave a trace in the system. How do we ask questions that are testing hypotheses. How do we do this in a way that's easy enough that it leads to action?

How do we feed this back for different audiences?

What are people actually doing? \ Just poke around in the database

What is effective / engaging?\ Duration and frequency of discussion differ between different levels of students\ Which of the various interventions in Grockit lead to largest learning gains?


Make it very easy to run experiments in the code, A/B testing etc, including generating reports. 

Webbased analytics interface - showing all reports etc. Decision makers can subscribe to different reports. 

Self-reported SAT scores etc used to "calibrate" system.\ Apache system "Mahout"  OSS - thanks! :)

Doug Clow (OU) iSpot - - @dougclow -

"A place to share nature". 

How do we connect people who want to watch tv on nature, with OER lessons on nature?

Interesting "badge system" at iSpot - you can see if they have taken a course, if they are active in a society, if they are experts etc. Similar to video games - Also similar to what P2PU is trying to build, see Erin Knight's "badge paper": - building a distributed authenticated badge system, to let you take badges from for example iSpot "with you" to other platforms

Underpinning theory\ "Fairy rings of participation" Makriyannis and De Liddo 2010 -

Reputation and learning\ informal learning context. assessment very important for learning, very heard to provide in informal learning context. \ "reputation as a proxy measure of learning. - not (just) social approval

If you make an identification, and an expert agrees with you, your reputation +1. If you agree with someone else, theirs goes up by your reputation/1000. (Experts get 1000).

Reputation analytics\ How does this work - is reputation mostly increased by experts, or what?\ Long-tail distribution of engagement means a functioning social network?

Not everything that looks like a power-law is a power-law. 

Learning analytics cycle - learners -> data -> metrics/analytics -> intervention -> learners (all over again)

Main feedback cycle is the reputation generated. 

Observations and reputation received (learning) are highly unequally distributed (fat tail)

Reputation given is even more highly unequal - experts have an amplified effect

Any correlation between: agreements given and received, or reputation given and received, are weak, highly nonlinear, and distinct.

Informal learning context - feedback direct to other learners, not mediated by specialists\ Participation pattern is typical of social software

Effective informal learning assessment by reputation

Future\ Adapt reputation system to other domains\ More sophisticated fitting\ Social network analysis\ Identifying learning (reputation vs formal course)\ More qualitative research

Role of experts\ Incentives? How are they identified? They do volunteer - it's an important thing to do. We talk to places where we can get these people, people in the project have lot's of relationships. The expertise in natural history resides in these amateur groups. Feature them on the site. 

Initial purpose was not to create scientific data, but to create more people who can create scientific data. But actually, some of the observations have been useful - trying to enable exporting of information to amateur societies. 

Serious games - games with a purpose.

Xavier - Ecuador slides:

Learning respositories are not working because growth is linear, not exponential like youtube

connexions is exponential

why do OCW users contribute more than Merlot users? Answer: Engagement - there must be a value proposition

Reuse is the main feature of Learning Objects but very little is known about actual reuse rates

Registry of Open Access Repositories

Dan Suthers - publications:

Unified Framework for Multi-Level Analysis of Distributed Learning

multiple theories about learning in social settings\ - social as stimulus to social entity as learning agent\ -networked individualism to maintaining  a joint conception of problem\ - diffusion of innovation to knowledge building

All involve uptake (Suthers, ijCSCL 2006, learning epistemologies). 

Uptake is evidenced by how individual actions are observably contingent on the actions of others in their socio-technical context

How learning takes place through interplay between individual and collective agency\ - situated accomplishment of individuals and small groups\ - local accomplishments giving rise to larger phenomena in networks\ Requires coordinated multi-level analysis

distributed across multiple media and sites (chat, whiteboard, etc)\ "Distributed activity may be analytically cloaked"

Abstract transcript representation

Adjacency pairs - each event is related to the one before. \ Contingency graphs - empirical relationships between events that collectively evidence uptake (dependencies?) (Garfinkel: contingently achieved accomplishments)

Media dependency: to reply to a message, it must first be written\ Read events - you must read, to be able to write\ Temporal contingency, or events that contain the same actor - more powerful contingency (they did something right after reading a message)

Lexical or semantic overlap - reuse of noun phrases

Collections of contingencies as evidence of uptake. 

Associograms: directed afiliation network of actors and artifacts\ Mediation model: how actors' associations are mediated

A round trip, interaction patterns. (Something that you cannot see in threading structure, but is shown by contingency graphs). 

Relationships - associograms and pariwise associations (relationship model)

Multimedia associations \ Characterize pairwise relationships in terms of distribution across media\ Compare roles of various media in supporting associations

Social ties - enables application of Social Network Analysis methods

Tool enables us to go from log data to ties - get representations to a level where you want to do your analysis. Keep tie to data, so you can go back to the evidence at any point. Multi-level analysis.

Use contingency graphs for\ - microanalysis\ - semi-automated analyses of graph manipulations to find pivotal moments

Tapped In (SRI International)\ - network of educators, professional development and peer support\ - 20k educators, 8k user-created spaces etc\ - lot's of media (threaded discussion, chats, wikis, resource sharing)

Imported all activity for a two-year period into framework. Generate contingencies. 

Workshop at CSCL 2011 in HK:

Bakharia - SNAPP - A bird's eye view of temporal participation interaction

Diagnostic instrument allows teaching staff to evlauate student behavioural patterns against learning activity design objectives and intervene as required in a timely manner

SNA can be used to identify\     - learner isolation\     - creativity\     - community formation

How can we realize the potential of real-time SNA?\ making the analysis transparent to the user... as per doug clow suggested earlier. it gives the data the potential of being the intervention

Two forums, same number of messages and participants. Threaded view - doesn't tell us if they are structurally different, temporal activity.

From relationship point of view: in forum A, no interaction, all via tutor.

We are representing the data with wrong visual metaphor - we need different ways of representing the data (live). (To learners? OR just to tutors?) - can we embed these sociograms within the forums themselves?

Tool:\ Integration w/ LMS (Moodle, BB, D2L)\ Render a sociogram as alternate representation of the threaded discussion view

Difficulties with integrating with LMSes - APIs didn't let you interrogate discussion forums, not allowed to directly access database, etc. Plugins limited to adding new features, not modifying existing features.

To install, drag button onto toolbar (so it's a bookmarklet)\ You visit a forum, click the bookmarklet. SNA diagram appears. 

Can also annotate, add a date when you are trying a new strategy for example, and it will keep that as a log. 

Learner isolations - dense interactions between central nodes, etc\ Facilitator Centric patterns

My critique: who you reply to is not a great indicator of whom you are interacting with. I might read a whole thread, and reply to the last post, but include replies to all the previous postings in my post - something like Suther's uptake model.

Future directions:\ Content analysis\ Behavioural modeling\ Topic modeling

Ravi Varatrapu

NEXT-TELL project (

High-density classrooms, rich personalized learning environments, one-to-one laptop projects\ Information overload - how can teachers take advantage of all this data?

Learning ecologies\     students have access to large network of information resources, tools, and social resources\     students know different things, and know differently\     \ Challenge of teaching adaptively and personalised in the high-density classroom combined with a rich information

NEXT-TELL - innovation platform for formative classroom assessment 

Teaching analytics\ - learning sciences (interactional pathways to learning outcomes)\ - learning analytics (systemic metrics)\ - visual analytics (tools & techniques)

Dynamic diagnostic pedaogical decision-making\ - leanring activity designa nd formative assessment\ - Classroom Information Systems (CIS)\     - data provenance and process provenance\     - meaningful\     - actionable\ - methods, tools, and training for a new "professional vision"

Design Based Research Expert\ Teaching Expert

Teaching analytics\ - evidence-centered activity & assessment design\ - learning activity & assessment tracing\ - open learner models (OLM)\ - learning analytics\ - visual analytics

Open Learning Models (from intelligent tutoring systems)

inspectable, scrutable by learners.\ "reflection of a bear in a pond"

teacher uses ECAD planner to deploy assessments, activites recorded, formative assessments recorded, into analytic engine, a visualization on task progress - this visualization is available ot learner.

Empirical design-based research, first led by researchers, then led by teachers

Computational social science laboratory (CSSL) - eye tracking, neurological, physiological data collection equipment.\ 60-80 classrooms

We need to invent visualizations / representations. Eye tracking: get at good designs for visualizations. Put eye tracker in classroom, see how are teachers using it? How does communication layer hold up in real time in the classroom?

"Good data comes from good instrumentation. Need multiple measures to correlate."

Phil Ice - Multi-level Institutional Application of Analytics - American Public University System

Dashboard that lists all 86,000 learners, sorted by most likely to disenrolling during the next five days. Uses 86 different factors, let's you drill down, and directly engage with action.

Semantic analysis - granularity model\ Purpose - accreditation of institution, showing that you meet all course objectives etc. Mapping resources to objectives.

Federation, disaggregation, relational mapping, ontological ordering

Injection engine (on SourceForge). Injest any kind of content, strip content out of anything (including PDF), and turn it into XML. Then disaggregate the content - if you have series of JPEGs associated with text, video - disaggregate them. Utilize metadata of JPEG, natural language processing of txt, order them against ontologies that you specify. Adobe has tools for audio-to-text analysis for video. Apparently very difficult to learn - give one person 3 months to play with it...

Gap analysis - shows which goals have not been fulfilled.

Compared to two independent human coders - 93% accuracy (three passes of refinement of LSA).

92.7% savings in time - \$83k saved

Roundtripping - take in student work and have it subjected to LSA, to match students' work to pre-formed ontologies - actual evidence of learning outcomes

Chris - lecture capture

Every 30 seconds sends heartbeat back to server - who is watching, what are they watching, where are they in the video etc. 

Automatically capturing slide transitions etc

Using logging data points, we can almost perfectly reconstitute their viewing behaviour. 

Note-taking panel, both individual notes, and global notes - by slide (not much takeoff)

opencast Matterhorn project - lecture capture solution, open and free, built for higher ed. Want to build in all this analytics tracking.


Teplovs, Fujita and Varatrapu - Generating Predictive Models of Learner Community Dynamics (

Latent semantic analysis - too many dimensions for traditional analysis - visualization might be a solution.

Knowledge Space Visualizer -

Research by Chris Teplovs - (including his PhD thesis, about this topic)

Explicit links and implicit links, cosine between vectors.\ Chronology and authorship can also be included. 

Advantages\     - flexible tresholds

Use this to generate a learning model - "you are what you write"

Vector representation of each user. Define similarity between any two user models. Hypothesis under which we could expect to see productive interactions (for example Vygotsky's Zone of Proximal Development)

Latent Semantic Analysis + Interaction-based User Models\ "potential for productive interaction"\ "actual interaction"\ can start looking at interplay between potential and actual interactions

Can we use something like game theory to understand community dynamics?\     would need to understand the "payoffs" that accrue to interactions, the strategies (perhaps multiple) that participants employ, and the repertoires of strategies

Offer of summer visits to Copenhagen for PhDs or post-docs.

Ravi (again) Cultural Consideration sin Learning Analytics -,

culture? what is it? 200 definitions of culture. 

social aspects of HCI (Reeves and Nass 1996, The Media Equation)

culture and CSCL?\ how people do with technologies, outcomes don't differ - different interaction pathways, but not different product?

problem solving with conceptual representation - how do people use tools/affordances. How do they interact socially, discourse presence, social/cognitive presence. What do they think of the participants after the collaborative session?

American-American, American-Chinese, Chinese-Chinese etc\ No difference in learning outcomes. But very different how they go there.

Borrowing a lot from other disciplines - need for an integrative theory of culture and sociotechnical interactions. 

Interacting with technologies and interacting with others via technology

Structures of technological intersubjectivity

Affordances - arguing for a tight link between perception and action\ - meaning making opportunities and action taking possibilities in an actor-environment system in a particular situation, relative to actor competencies and system capabilities

Appropriation of affordances. In some Asian classrooms, not appropriate to ask very difficult question to teacher (face saving). 

Intentional utilization of affordances is culture-sensitive, context-dependent ("God's must be crazy")


Combining this with Dan Suthers work on uptake

Mike Sharkey - Academic Analytics Landscape at U Phoenix

435,000 students at U of Phoenix

30+ databases  430+ tables\ 1.5 TB increasing by 100Gb/month

All data is copied into a central repository from external DBs

Tablo - data visualization tool - expensive

Presentation from Spain (Abelardo?)

Using a virtual machine, with built in "spying", which captures compiling, errors, URLs etc. Huge amount of data about how students are working. 

Stian Håklev February 28, 2011 Toronto, Canada
comments powered by Disqus