Two years after the NYT’s ‘Year of the MOOC’: how much do we actually know about them?

Timeline of the development of MOOCs and open education, from: Yuan, Li, and Stephen Powell. MOOCs and Open Education: Implications for Higher Education White Paper. University of Bolton: CETIS, 2013.

Ed: Does research on MOOCs differ in any way from existing research on online learning?

Rebecca: Despite the hype around MOOCs to date, there are many similarities between MOOC research and the breadth of previous investigations into (online) learning. Many of the trends we’ve observed (the prevalence of forum lurking; community formation; etc.) have been studied previously and are supported by earlier findings. That said, the combination of scale, global-reach, duration, and “semi-synchronicity” of MOOCs have made them different enough to inspire this work. In particular, the optional nature of participation among a global-body of lifelong learners for a short burst of time (e.g. a few weeks) is a relatively new learning environment that, despite theoretical ties to existing educational research, poses a new set of challenges and opportunities.

Ed: The MOOC forum networks you modelled seemed to be less efficient at spreading information than randomly generated networks. Do you think this inefficiency is due to structural constraints of the system (or just because inefficiency is not selected against); or is there something deeper happening here, maybe saying something about the nature of learning, and networked interaction?

Rebecca: First off, it’s important to not confuse the structural “inefficiency” of communication with some inherent learning “inefficiency”. The inefficiency in the sub-forums is a matter of information diffusion—i.e., because there are communities that form in the discussion spaces, these communities tend to “trap” knowledge and information instead of promoting the spread of these ideas to a vast array of learners. This information diffusion inefficiency is not necessarily a bad thing, however. It’s a natural human tendency to form communities, and there is much education research that says learning in small groups can be much more beneficial / effective than large-scale learning. The important point that our work hopes to make is that the existence and nature of these communities seems to be influenced by the types of topics that are being discussed (and vice versa)—and that educators may be able to cultivate more isolated or inclusive network dynamics in these course settings by carefully selecting and presenting these different discussion topics to learners.

Ed: Drawing on surveys and learning outcomes you could categorise four ‘learner types’, who tend to behave differently in the network. Could the network be made more efficient by streaming groups by learning objective, or by type of interaction (eg learning / feedback / social)?

Rebecca: Given our network vulnerability analysis, it appears that discussions that focus on problems or issues that are based in real life examples—e.g., those that relate to case studies of real companies and analyses posted by learners of these companies—tend to promote more inclusive engagement and efficient information diffusion. Given that certain types of learners participate in these discussions, one could argue that forming groups around learning preferences and objectives could promote more efficient communications. Still, it’s important to be aware of the potential drawbacks to this, namely, that promoting like-minded/similar people to interact with those they are similar to could further prevent “learning through diverse exposures” that these massive-scale settings can be well-suited to promote.

Ed: In the classroom, the teacher can encourage participation and discussion if it flags: are there mechanisms to trigger or seed interaction if the levels of network activity fall below a certain threshold? How much real-time monitoring tends to occur in these systems?

Rebecca: Yes, it appears that educators may be able to influence or achieve certain types of network patterns. While each MOOC is different (some course staff members tend to be much more engaged than others, learners may have different motivations, etc.), on the whole, there isn’t much real-time monitoring in MOOCs, and MOOC platforms are still in early days where there is little to no automated monitoring or feedback (beyond static analytics dashboards for instructors).

Ed: Does learner participation in these forums improve outcomes? Do the most central users in the interaction network perform better? And do they tend to interact with other very central people?

Rebecca: While we can’t infer causation, we found that when compared to the entire course, a significantly higher percentage of high achievers were also forum participants. The more likely explanation for this is that those who are committed to completing the course and performing well also tend to use the forums—but the plurality of forum participants (44% in one of the courses we analysed) are actually those that “fail” by traditional marks (receive below 50% in the course). Indeed, many central users tend to be those that are simply auditing the course or who are interested in communicating with others without any intention of completing course assignments. These central users tend to communicate with other central users, but also, with those whose participation is much sparser/“on the fringes”.

Ed: Slightly facetiously: you can identify ‘central’ individuals in the network who spark and sustain interaction. Can you also find people who basically cause interaction to die? Who will cause the network to fall apart? And could you start to predict the strength of a network based on the profiles and proportions of the individuals who make it up?

Rebecca: It is certainly possible to further explore how different people seem. One way this can be achieved is by exploring the temporal dynamics at play—e.g., by visualising the communication network at any point in time and creating network “snapshots” at every hour or day, or perhaps, with every new participant, to observe how the trends and structures evolve. While this method still doesn’t allow us to identify the exact influence of any given individual’s participation (since there are so many other confounding factors, for example, how far into the course it is, peoples’ schedules/lives outside of the MOOC, etc.), it may provide some insight into their roles. We could of course define some quantitative measure(s) to measure “network strength” based on learner profiles, but caution against overarching or broad claims in doing so due to confounding forces would be essential.

Ed: The majority of my own interactions are mediated by a keyboard: which is actually a pretty inefficient way of communicating, and certainly a terrible way of arguing through a complex point. Is there any sense from MOOCs that text-based communication might be a barrier to some forms of interaction, or learning?

Rebecca: This is an excellent observation. Given the global student body, varying levels of comfort in English (and written language more broadly), differing preferences for communication, etc., there is much reason to believe that a lack of participation could result from a lack of comfort with the keyboard (or written communication more generally). Indeed, in the MOOCs we’ve studied, many learners have attempted to meet up on Google Hangouts or other non-text based media to form and sustain study groups, suggesting that many learners seek to use alternative technologies to interact with others and achieve their learning objectives.

Ed: Based on this data and analysis, are there any obvious design points that might improve interaction efficiency and learning outcomes in these platforms?

Rebecca: As I have mentioned already, open-ended questions that focus on real-life case studies tend to promote the least vulnerable and most “efficient” discussions, which may be of interest to practitioners looking to cultivate these sorts of environments. More broadly, the lack of sustained participation in the forums suggests that there are a number of “forces of disengagement” at play, one of them being that the sheer amount of content being generated in the discussion spaces (one course had over 2,700 threads and 15,600 posts) could be contributing to a sense of “content overload” and helplessness for learners. Designing platforms that help mitigate this problem will be fundamental to the vitality and effectiveness of these learning spaces in the future.

Ed: I suppose there is an inherent tension between making the online environment very smooth and seductive, and the process of learning; which is often difficult and frustrating: the very opposite experience aimed for (eg) by games designers. How do MOOCs deal with this tension? (And how much gamification is common to these systems, if any?)

Rebecca: To date, gamification seems to have been sparse in most MOOCs, although there are some interesting experiments in the works. Indeed, one study (Anderson et al., 2014) used a randomised control trial to add badges (that indicate student engagement levels) next to the names of learners in MOOC discussion spaces in order to determine if and how this affects further engagement. Coursera has also started to publicly display badges next to the names of learners that have signed up for the paid Signature Track of a specific course (presumably, to signal which learners are “more serious” about completing the course than others). As these platforms become more social (and perhaps career advancement-oriented), it’s quite possible that gamification will become more popular. This gamification may not ease the process of learning or make it more comfortable, but rather, offer additional opportunities to mitigate the challenges massive-scale anonymity and lack of information about peers to facilitate more social learning.

Ed: How much of this work is applicable to other online environments that involve thousands of people exploring and interacting together: for example deliberation, crowd production and interactive gaming, which certainly involve quantifiable interactions and a degree of negotiation and learning?

Rebecca: Since MOOCs are so loosely structured and could largely be considered “informal” learning spaces, we believe the engagement dynamics we’ve found could apply to a number of other large-scale informal learning/interactive spaces online. Similar crowd-like structures can be found in a variety of policy and practice settings.

Ed: This project has adopted a mixed methods approach: what have you gained by this, and how common is it in the field?

Rebecca: Combining computational network analysis and machine learning with qualitative content analysis and in-depth interviews has been one of the greatest strengths of this work, and a great learning opportunity for the research team. Often in empirical research, it is important to validate findings across a variety of methods to ensure that they’re robust. Given the complexity of human subjects, we knew computational methods could only go so far in revealing underlying trends; and given the scale of the dataset, we knew there were patterns that qualitative analysis alone would not enable us to detect. A mixed-methods approach enabled us to simultaneously and robustly address these dimensions. MOOC research to date has been quite interdisciplinary, bringing together computer scientists, educationists, psychologists, statisticians, and a number of other areas of expertise into a single domain. The interdisciplinarity of research in this field is arguably one of the most exciting indicators of what the future might hold.

Ed: As well as the network analysis, you also carried out interviews with MOOC participants. What did you learn from them that wasn’t obvious from the digital trace data?

Rebecca: The interviews were essential to this investigation. In addition to confirming the trends revealed by our computational explorations (which revealed the what of the underlying dynamics at play), the interviews, revealed much of the why. In particular, we learned people’s motivations for participating in (or disengaging from) the discussion forums, which provided an important backdrop for subsequent quantitative (and qualitative) investigations. We have also learned a lot more about people’s experiences of learning, the strategies they employ to their support their learning and issues around power and inequality in MOOCs.

Ed: You handcoded more than 6000 forum posts in one of the MOOCs you investigated. What findings did this yield? How would you characterise the learning and interaction you observed through this content analysis?

Rebecca: The qualitative content analysis of over 6,500 posts revealed several key insights. For one, we confirmed (as the network analysis suggested), that most discussion is insignificant “noise”—people looking to introduce themselves or have short-lived discussions about topics that are beyond the scope of the course. In a few instances, however, we discovered the different patterns (and sometimes, cycles) of knowledge construction that can occur within a specific discussion thread. In some cases, we found that discussion threads grew to be so long (with over hundreds of posts), that topics were repeated or earlier posts disregarded because new participants didn’t read and/or consider them before adding their own replies.

Ed: How are you planning to extend this work?

Rebecca: As mentioned already, feelings of helplessness resulting from sheer “content overload” in the discussion forums appear to be a key force of disengagement. To that end, as we now have a preliminary understanding of communication dynamics and learner tendencies within these sorts of learning environments, we now hope to leverage this background knowledge to develop new methods for promoting engagement and the fulfilment of individual learning objectives in these settings—in particular, by trying to mitigate the “content overload” issues in some way. Stay tuned for updates 🙂


Anderson, A., Huttenlocher, D., Kleinberg, J. & Leskovec, J., Engaging with Massive Open Online Courses.  In: WWW ’14 Proceedings of the 23rd International World Wide Web Conference, Seoul, Korea. New York: ACM (2014).

Read the full paper: Gillani, N., Yasseri, T., Eynon, R., and Hjorth, I. (2014) Structural limitations of learning in a crowd – communication vulnerability and information diffusion in MOOCs. Scientific Reports 4.

Rebecca Eynon was talking to blog editor David Sutcliffe.

Rebecca Eynon holds a joint academic post between the Oxford Internet Institute (OII) and the Department of Education at the University of Oxford. Her research focuses on education, learning and inequalities, and she has carried out projects in a range of settings (higher education, schools and the home) and life stages (childhood, adolescence and late adulthood).

What are the limitations of learning at scale? Investigating information diffusion and network vulnerability in MOOCs

Millions of people worldwide are currently enrolled in courses provided on large-scale learning platforms (aka ‘MOOCs’), typically collaborating in online discussion forums with thousands of peers. Current learning theory emphasises the importance of this group interaction for cognition. However, while a lot is known about the mechanics of group learning in smaller and traditionally organised online classrooms, fewer studies have examined participant interactions when learning “at scale.” Some studies have used clickstream data to trace participant behaviour; even predicting dropouts based on their engagement patterns. However, many questions remain about the characteristics of group interactions in these courses, highlighting the need to understand whether—and how—MOOCs allow for deep and meaningful learning by facilitating significant interactions.

But what constitutes a “significant” learning interaction? In large-scale MOOC forums, with socio-culturally diverse learners with different motivations for participating, this is a non-trivial problem. MOOCs are best defined as “non-formal” learning spaces, where learners pick and choose how (and if) they interact. This kind of group membership, together with the short-term nature of these courses, means that relatively weak inter-personal relationships are likely. Many of the tens of thousands of interactions in the forum may have little relevance to the learning process. So can we actually define the underlying network of significant interactions? Only once we have done this can we explore firstly how information flows through the forums, and secondly the robustness of those interaction networks: in short, the effectiveness of the platform design for supporting group learning at scale.

To explore these questions, we analysed data from 167,000 students registered on two business MOOCs offered on the Coursera platform. Almost 8000 students contributed around 30,000 discussion posts over the six weeks of the courses; almost 30,000 students viewed at least one discussion thread, totalling 321,769 discussion thread views. We first modelled these communications as a social network, with nodes representing students who posted in the discussion forums, and edges (ie links) indicating co-participation in at least one discussion thread. Of course, not all links will be equally important: many exchanges will be trivial (‘hello’, ‘thanks’ etc.). Our task, then, was to derive a “true” network of meaningful student interactions (ie iterative, consistent dialogue) by filtering out those links generated by random encounters (Figure 1; see also full paper for methodology).

Figure 1. Comparison of observed (a; ‘all interactions’) and filtered (b; ‘significant interactions’) communication networks for a MOOC forum. Filtering affects network properties such as modularity score (ie degree of clustering). Colours correspond to the automatically detected interest communities.

One feature of networks that has been studied in many disciplines is their vulnerability to fragmentation when nodes are removed (the Internet, for example, emerged from US Army research aiming to develop a disruption-resistant network for critical communications). While we aren’t interested in the effect of missile strike on MOOC exchanges, from an educational perspective it is still useful to ask which “critical set” of learners is mostly responsible for information flow in a communication network—and what would happen to online discussions if these learners were removed. To our knowledge, this is the first time vulnerability of communication networks has been explored in an educational setting.

Network vulnerability is interesting because it indicates how integrated and inclusive the communication flow is. Discussion forums with fleeting participation will have only a very few vocal participants: removing these people from the network will markedly reduce the information flow between the other participants—as the network falls apart, it simply becomes more difficult for information to travel across it via linked nodes. Conversely, forums that encourage repeated engagement and in-depth discussion among participants will have a larger ‘critical set’, with discussion distributed across a wide range of learners.

To understand the structure of group communication in the two courses, we looked at how quickly our modelled communication network fell apart when: (a) the most central nodes were iteratively disconnected (Figure 2; blue), compared with when (b) nodes were removed at random (ie the ‘neutral’ case; green). In the random case, the network degrades evenly, as expected. When we selectively remove the most central nodes, however, we see rapid disintegration: indicating the presence of individuals who are acting as important ‘bridges’ across the network. In other words, the network of student interactions is not random: it has structure.

Figure 2. Rapid network degradation results from removal of central nodes (blue). This indicates the presence of individuals acting as ‘bridges’ between sub-groups. Removing these bridges results in rapid degradation of the overall network. Removal of random nodes (green) results in a more gradual degradation.

Of course, the structure of participant interactions will reflect the purpose and design of the particular forum. We can see from Figure 3 that different forums in the courses have different vulnerability thresholds. Forums with high levels of iterative dialogue and knowledge construction—with learners sharing ideas and insights about weekly questions, strategic analyses, or course outcomes—are the least vulnerable to degradation. A relatively high proportion of nodes have to be removed before the network falls apart (rightmost-blue line). Forums where most individuals post once to introduce themselves and then move their discussions to other platforms (such as Facebook) or cease engagement altogether tend to be more vulnerable to degradation (left-most blue line). The different vulnerability thresholds suggest that different topics (and forum functions) promote different levels of forum engagement. Certainly, asking students open-ended questions tended to encourage significant discussions, leading to greater engagement and knowledge construction as they read analyses posted by their peers and commented with additional insights or critiques.

Figure 3 – Network vulnerabilities of different course forums.

Understanding something about the vulnerability of a communication or interaction network is important, because it will tend to affect how information spreads across it. To investigate this, we simulated an information diffusion model similar to that used to model social contagion. Although simplistic, the SI model (‘susceptible-infected’) is very useful in analysing topological and temporal effects on networked communication systems. While the model doesn’t account for things like decaying interest over time or peer influence, it allows us to compare the efficiency of different network topologies.

We compared our (real-data) network model with a randomised network in order to see how well information would flow if the community structures we observed in Figure 2 did not exist. Figure 4 shows the number of ‘infected’ (or ‘reached’) nodes over time for both the real (solid lines) and randomised networks (dashed lines). In all the forums, we can see that information actually spreads faster in the randomised networks. This is explained by the existence of local community structures in the real-world networks: networks with dense clusters of nodes (i.e. a clumpy network) will result in slower diffusion than a network with a more even distribution of communication, where participants do not tend to favor discussions with a limited cohort of their peers.

Figure 4 (a) shows the percentage of infected nodes vs. simulation time for different networks. The solid lines show the results for the original network and the dashed lines for the random networks. (b) shows the time it took for a simulated “information packet” to come into contact with half the network’s nodes.

Overall, these results reveal an important characteristic of student discussion in MOOCs: when it comes to significant communication between learners, there are simply too many discussion topics and too much heterogeneity (ie clumpiness) to result in truly global-scale discussion. Instead, most information exchange, and by extension, any knowledge construction in the discussion forums occurs in small, short-lived groups: with information “trapped” in small learner groups. This finding is important as it highlights structural limitations that may impact the ability of MOOCs to facilitate communication amongst learners that look to learn “in the crowd”.

These insights into the communication dynamics motivate a number of important questions about how social learning can be better supported, and facilitated, in MOOCs. They certainly suggest the need to leverage intelligent machine learning algorithms to support the needs of crowd-based learners; for example, in detecting different types of discussion and patterns of engagement during the runtime of a course to help students identify and engage in conversations that promote individualised learning. Without such interventions the current structural limitations of social learning in MOOCs may prevent the realisation of a truly global classroom.

The next post addresses qualitative content analysis and how machine-learning community detection schemes can be used to infer latent learner communities from the content of forum posts.

Read the full paper: Gillani, N., Yasseri, T., Eynon, R., and Hjorth, I. (2014) Structural limitations of learning in a crowd – communication vulnerability and information diffusion in MOOCs. Scientific Reports 4.

Rebecca Eynon holds a joint academic post between the Oxford Internet Institute (OII) and the Department of Education at the University of Oxford. Her research focuses on education, learning and inequalities, and she has carried out projects in a range of settings (higher education, schools and the home) and life stages (childhood, adolescence and late adulthood).