Facts and figures or prayers and hugs: how people with different health conditions support each other online

Online support groups are being used increasingly by individuals who suffer from a wide range of medical conditions. OII DPhil Student Ulrike Deetjen‘s recent article with John PowellInformational and emotional elements in online support groups: a Bayesian approach to large-scale content analysis uses machine learning to examine the role of online support groups in the healthcare process. They categorise 40,000 online posts from one of the most well-used forums to show how users with different conditions receive different types of support.

Online forums are important means of people living with health conditions to obtain both emotional and informational support from this in a similar situation. Pictured: The Alzheimer Society of B.C. unveiled three life-size ice sculptures depicting important moments in life. The ice sculptures will melt, representing the fading of life memories on the dementia journey. Image: bcgovphotos (Flickr)
Online forums are important means of people living with health conditions to obtain both emotional and informational support from this in a similar situation. Pictured: The Alzheimer Society of B.C. unveiled three life-size ice sculptures depicting important moments in life. The ice sculptures will melt, representing the fading of life memories on the dementia journey. Image: bcgovphotos (Flickr)

Online support groups are one of the major ways in which the Internet has fundamentally changed how people experience health and health care. They provide a platform for health discussions formerly restricted by time and place, enable individuals to connect with others in similar situations, and facilitate open, anonymous communication.

Previous studies have identified that individuals primarily obtain two kinds of support from online support groups: informational (for example, advice on treatments, medication, symptom relief, and diet) and emotional (for example, receiving encouragement, being told they are in others’ prayers, receiving “hugs”, or being told that they are not alone). However, existing research has been limited as it has often used hand-coded qualitative approaches to contrast both forms of support, thereby only examining relatively few posts (<1,000) for one or two conditions.

In contrast, our research employed a machine-learning approach suitable for uncovering patterns in “big data”. Using this method a computer (which initially has no knowledge of online support groups) is given examples of informational and emotional posts (2,000 examples in our study). It then “learns” what words are associated with each category (emotional: prayers, sorry, hugs, glad, thoughts, deal, welcome, thank, god, loved, strength, alone, support, wonderful, sending; informational: effects, started, weight, blood, eating, drink, dose, night, recently, taking, side, using, twice, meal). The computer then uses this knowledge to assess new posts, and decide whether they contain more emotional or informational support.

With this approach we were able to determine the emotional or informational content of 40,000 posts across 14 different health conditions (breast cancer, prostate cancer, lung cancer, depression, schizophrenia, Alzheimer’s disease, multiple sclerosis, cystic fibrosis, fibromyalgia, heart failure, diabetes type 2, irritable bowel syndrome, asthma, and chronic obstructive pulmonary disease) on the international support group forum Dailystrength.org.

Our research revealed a slight overall tendency towards emotional posts (58% of posts were emotionally oriented). Across all diseases, those who write more also tend to write more emotional posts—we assume that as people become more involved and build relationships with other users they tend to provide more emotional support, instead of simply providing information in one-off interactions. At the same time, we also observed that older people write more informational posts. This may be explained by the fact that older people more generally use the Internet to find information, that they become experts in their chronic conditions over time, and that with increasing age health conditions may have less emotional impact as they are relatively more expected.

The demographic prevalence of the condition may also be enmeshed with the disease-related tendency to write informational or emotional posts. Our analysis suggests that content differs across the 14 conditions: mental health or brain-related conditions (such as depression, schizophrenia, and Alzheimer’s disease) feature more emotionally oriented posts, with around 80% of posts primarily containing emotional support. In contrast, nonterminal physical conditions (such as irritable bowel syndrome, diabetes, asthma) rather focus on informational support, with around 70% of posts providing advice about symptoms, treatments, and medication.

Finally, there was no gender difference across conditions with respect to the amount of posts that were informational versus emotional. That said, prostate cancer forums are oriented towards informational support, whereas breast cancer forums feature more emotional support. Apart from the generally different nature of both conditions, one explanation may lie in the nature of single-gender versus mixed-gender groups: an earlier meta-study found that women write more emotional content than men when talking among others of the same gender – but interestingly, in mixed-gender discussions, these differences nearly disappeared.

Our research helped to identify factors that determine whether online content is informational or emotional, and demonstrated how posts differ across conditions. In addition to theoretical insights about patient needs, this research will help practitioners to better understand the role of online support groups for different patients, and to provide advice to patients about the value of online support.

The results also suggest that online support groups should be integrated into the digital health strategies of the UK and other nations. At present the UK plan for “Personalised Health and Care 2020” is centred around digital services provided within the health system, and does not yet reflect the value of person-generated health data from online support groups to patients. Our research substantiates that it would benefit from considering the instrumental role that online support groups can play in the healthcare process.

Read the full paper: Deetjen, U. and J. A. Powell (2016) Informational and emotional elements in online support groups: a Bayesian approach to large-scale content analysis. Journal of the American Medical Informatics Association. http://dx.doi.org/10.1093/jamia/ocv190

Ulrike Deetjen (née Rauer) is a doctoral student at the Oxford Internet Institute researching the influence of the Internet on healthcare provision and health outcomes.

Examining the data-driven value chains that are changing Rwanda’s tea sector

Behind the material movement that takes tea from the slopes of Rwanda’s ‘thousand hills’ to a box on a shelf in Tesco, is a growing set of less visible digital data flows. Image by pasunejen.
Production of export commodity goods like tea, coffee and chocolate is an important contributor to economies in Africa. Producers sell their goods into international markets, with the final products being sold in supermarkets, here in the UK and throughout the world. So what role is new Internet connectivity playing in changing these sectors — which are often seen as slow to adopt new technologies? As part of our work examining the impacts of growing Internet connectivity and new digital ICTs in East Africa we explored uses of the Internet and ICTs in the tea sector in Rwanda.

Tea is a sector with well-established practices and relations in the region, so we were curious if ICT might be changing it. Of course, one cannot ignore the movements of material goods when you research the tea sector. Tea is Rwanda’s main export by value, and in 2012 it moved over 21,000 tonnes of tea, accruing around $56m in value. During our fieldwork we interviewed cooperatives in remote offices surrounded by tea plantations in the temperate Southern highlands, tea processors in noisy tea factories heavy with the overpowering smell of fermenting tea leaves, and tea buyers and sellers surrounded by corridors piled high with sacks of tea.

But behind the material movement that takes tea from the slopes of Rwanda’s ‘thousand hills’ to a box on a shelf in Tesco, is a growing set of less visible digital data flows. Whilst the adoption of digital technologies is not comprehensive in the Rwandan tea sector (with, for example, very low Internet use among tea growers), we did find growing use of the Internet and ICTs. More importantly, where they were present, digital flows of information (such as tea-batch tracking, logistics and sales prices) were increasingly important to the ability of firms to improve production and ultimately to increase their profit share from tea. We have termed this a ‘data-driven value chain’ to highlight that these new digital information flows are becoming as important as the flows of material goods.

So why is tea production becoming increasingly ‘data-driven’? We found two principal drivers at work. Firstly, production of commodities like tea has shifted to private ownership. In Rwanda, tea processing factories are no longer owned by the government (as they were a decade ago) but by private firms, including several multinational tea firms. Prices for buying and selling tea are also no longer fixed by the government, but depend on the market — flat rate prices stopped at the end of 2012. Data on everything from international prices, tea quality and logistics has become increasingly important as Rwandan tea firms look to be part of the global market, by better coordinating production and improving the prices of their tea. For instance, privately owned tea factories (often in remote locations) connect via satellite or microwave Internet links to head offices, and systems integration allows multi-national tea firms the ability to track and monitor production at the touch of a button.

Secondly, we need to understand new product innovation in the tea sector. In recent years new products have particularly revolved around growing demand in the retail market for differentiated products — such as ‘environmental’, fair trade or high quality teas — for which the consumer is willing to pay more. This relates most obviously to the activities in the fields and tea processors, but digital information is also crucial in order to allow for ‘traceability’ of tea. As this guarantees that tea batches have satisfied conditions around location, food safety, chemical use, fair labour (etc.) a key component of new product innovation is therefore data — because it is integral to firms’ abilities to prove their value-added production or methods.

The idea of agricultural value chains — of analysing agricultural production from the perspective of a fragmented network of interconnected firms — has become increasingly influential in strategies and policy making supported by large donors such as the World Bank and the International Fund for Agriculture Development (IFAD), an agency of the UN.

These value chain approaches explore the amount of economic ‘value’ that different actors in the supply chain are able to capture in production. For instance, Rwandan tea farmers are only able to capture very small proportions of the final retail prices — we estimate they are paid less than 6% of the cost of the eventual retail product, and only 22% of the cost of the raw tea that is sold to retailers. Value chain analysis has been popular for policy makers and donors in that it helps them to formulate policies to support how firms in countries like Rwanda improve their value through innovation, improving processes of production, or reaching new customers.

Yet, at the moment it appears that the types of analysis being done by policy makers and donors pay very little attention to the importance of digital data, and so they are presenting an unclear picture of the ways to improve — with a tendency to focus on material matters such as machinery or business models.

Our research particularly highlighted the importance of considering how to adapt digital data flows. The ways that digital information is codified, digitised and accessed can be exclusionary, reducing the ability for smaller actors in Rwanda to compete. For instance, we found that lack of access to clear information about prices, tea quality and wider market information means that smallholders, small processors and cooperatives may not compete as well as they could, or be missing on wider innovations in tea production.

While we have focused here only on tea production, our discussions with those working in other agricultural sectors — and in other countries — suggest that our observations have significance across other agricultural sectors. In agricultural production, strategy, policy and researchers mainly focus on the material elements of production — those which are more visible and quantifiable. However, we suggest that often underlying such actions is a growing layer of digital data activity. It is only through more coherent analysis of the role of digital technologies and data that we can better analyse production — and build appropriate policy and strategies to support commodity producers in sectors like Rwandan tea.

Read the full report: Foster, C., and Graham, M. (2015) Connectivity and the Tea Sector in Rwanda. Value Chains and Networks of Connectivity-Based Enterprises in Rwanda. Project Report, Oxford Internet Institute, University of Oxford.

Chris Foster is a researcher at the Oxford Internet Institute. His research focus is on technologies and innovation in developing and emerging markets, with a particular interest on how ICTs can support development of low income groups.

Why haven’t digital platforms transformed firms in developing countries? The Rwandan tourism sector explored

Tourism is becoming an increasingly important contributor to Rwanda’s economy. Image of Homo sapiens and Gorilla beringei beringei meeting in Rwanda’s Volcanoes National Park by Andries3.

One of the great hopes for new Internet connectivity in the developing world is that it will allow those in developing countries who offer products and services to link to and profit from global customers. With the landing of undersea Internet infrastructure in East Africa, there have been hopes that as firms begin to use the Internet more extensively that improved links to markets will positively impact them.

Central to enabling new customer transactions is the emergence of platforms — digital services, websites and online exchanges — that allow more direct customer-producer interactions to occur. As part of our work exploring the impacts of growing internet connectivity and digital ICTs in East Africa, we wanted to explore how digital platforms were affecting Rwandan firms. Have Rwandan firms been able to access online platforms? What impact has access to these platforms had on firms?

Tourism is becoming an increasingly important contributor to Rwanda’s economy, with 3.1% direct contribution to GDP, and representing 7% of employment. Tourism is typically focused on affluent international tourists who come to explore the wildlife of the country, most notably as the most accessible location to see the mountain gorilla. Rwandan policy makers see tourism as a potential area for expansion, and new connectivity could be one key driver in making the country more accessible to customers.

Tourist service providers in Rwanda have a very high Internet adoption, and even the smallest hotel or tour agency is likely to have at least one mobile Internet-connected laptop. Many of the global platforms also have a presence in the region: online travel agents such as Expedia and Hotels.com work with Rwandan hotels, common social media used by tourists such as TripAdvisor and Facebook are also well-known, and firms have been encouraged by the government to integrate into payment platforms like Visa.

So, in the case of Rwandan tourism, Internet connectivity, Internet access and sector-wide platforms are certainly available for tourism firms. During our fieldwork, however (and to our surprise) we found adoption of digital tourism platforms to be low, and the impact on Rwandan tourism minimal. Why? This came down to three mismatches – essentially to do with integration, with fit, and with interactions.

Global tourism platforms offer the potential for Rwandan firms to seamlessly reach a wider range of potential tourists around the globe. However, we found that the requirements for integration into global platforms were often unclear for Rwandan firms, and there was a poor fit with the existing systems and skills. For example, hotels and lodges normally integrate into online travel agencies through integration of internal information systems, which track bookings and availability within hotels. However, in Rwanda, whilst a few larger hotels used booking systems, even the medium-sized hotels lacked internal booking systems, with booking based on custom Excel spreadsheets, or even paper diaries. When tourism firms attempted to integrate into online services they thus ran into problems, and only the large (international) hotel chains tended to be fully integrated.

Integration of East African tourism service providers into global platforms was also limited by the nature of the activities in the region. Global platforms have typically focused on providing facilities for online information, booking and payment for discrete tourism components — a hotel, a flight, a review of an attraction. However, in East Africa much international tourism is ‘packaged’, meaning a third-party (normally a tour operator) will build an itinerary and make all the bookings for customers. This means that online tourism platforms don’t provide a particularly good fit, either for tourists or Rwandan service providers. A tourist will not want the complication of booking a full itinerary online, and a small lodge that gets most of its bookings through tour operators will see little potential in integrating into a global online platform.

Interaction of Rwandan tourism service providers with online platforms is inevitably undertaken over digital networks, based on remote interactions, payments and information flows. This arms-length relationship often becomes problematic where the skills and ability of service providers are lower. For example, Rwandan tourism service providers often require additional information, help or even training on how best to use platforms which are frequently changing. In contexts where lower cost Internet can at times be inconsistent, and payment systems can be busy, having the ability to connect to local help and discuss issues is important. Yet, this is the very element that global platforms like online travel agents are often trying to remove.

So in general, we found that tourism platforms supported the large international hotels and resorts where systems and structures were already in place for seamless integration into platforms. Indeed, as the Rwandan government looks to expand the tourism sector (such as through new national parks and regional integration), there is a risk that the digital domain will support generic international chains entering the country — over the expansion of local firms.

There are potential ways forward, though. Ironically, the most successful online travel agency in Rwanda is one that has contracted a local firm in the capital Kigali to allow for ‘thicker’ interactions between Rwandan service providers and platform providers. There are also a number of South African and Kenyan online platforms in the early stages of development that are more attuned to the regional contexts of tourism (for example Safari Now, a dynamic Safari scheduling platform; Nights Bridge, an online platform for smaller hotels; and WETU, an itinerary sharing platform for service providers), and these may eventually offer a better solution for Rwandan tourism service providers.

We came to similar conclusions in the other sectors we examined as part of our research in East Africa (looking at tea production and Business Process Outsourcing) — that is, that use of online platforms faces limitations in the region. Even as firms find themselves able to access the Internet, the way these global platforms are designed presents a poor fit to the facilities, activities and needs of firms in developing countries. Indeed, in globalised sectors (such as tourism and business outsourcing) platforms can be actively exclusionary, aiding international firms entering developing countries over those local firms seeking to expand outwards.

For platform owners and developers focusing on such developing markets, the impacts of greater access to the Internet are therefore liable to come when platforms are able to balance between global reach and standards — while also being able to integrate some of the specific needs and contexts of developing countries.

Read the full report: Foster, C., and Graham, M. (2015) The Internet and Tourism in Rwanda. Value Chains and Networks of Connectivity-Based Enterprises in Rwanda. Project Report, Oxford Internet Institute, University of Oxford.

Chris Foster is a researcher at the Oxford Internet Institute. His research focus is on technologies and innovation in developing and emerging markets, with a particular interest on how ICTs can support development of low income groups.

Gender gaps in virtual economies: are there virtual ‘pink’ and ‘blue’ collar occupations?

She could end up earning 11 percent less than her male colleagues .. Image from EVE Online by zcar.300.
She could end up earning 11 percent less than her male colleagues .. Image from EVE Online by zcar.300.

Ed: Firstly, what is a ‘virtual’ economy? And what exactly are people earning or exchanging in these online environments?

Vili: A virtual economy is an economy that revolves around artificially scarce virtual markers, such as Facebook likes or, in this case, virtual items and currencies in an online game. A lot of what we do online today is rewarded with such virtual wealth instead of, say, money.

Ed: In terms of ‘virtual earning power’ what was the relationship between character gender and user gender?

Vili: We know that in national economies, men and women tend to be rewarded differently for the same amount of work; men tend to earn more than women. Since online economies are such a big part of many people’s lives today, we wanted to know if this holds true in those economies as well. Looking at the virtual economies of two massively-multiplayer online games (MMOG), we found that there are indeed some gender differences in how much virtual wealth players accumulate within the same number of hours played. In one game, EVE Online, male players were on average 11 percent wealthier than female players of the same age, character skill level, and time spent playing. We believe that this finding is explained at least in part by the fact that male and female players tend to favour different activities within the game worlds, what we call “virtual pink and blue collar occupations”. In national economies, this is called occupational segregation: jobs perceived as suitable for men are rewarded differently from jobs perceived as suitable for women, resulting in a gender earnings gap.

However, in another game, EverQuest II, we found that male and female players were approximately equally wealthy. This reflects the fact that games differ in what kind of activities they reward. Some provide a better economic return on fighting and exploring, while others make it more profitable to engage in trading and building social networks. In this respect games differ from national economies, which all tend to be biased towards rewarding male-type activities. Going beyond this particular study, fantasy economies could also help illuminate the processes through which particular occupations come to be regarded as suitable for men or for women, because game developers can dream up new occupations with no prior gender expectations attached.

Ed: You also discussed the distinction between user gender and character gender…

Vili: Besides occupational segregation, there are also other mechanisms that could explain economic gender gaps, like differences in performance or outright discrimination in pay negotiations. What’s interesting about game economies is that people can appear in the guise of a gender that differs from their everyday identity: men can play female characters and vice versa. By looking at player gender and character gender separately, we can distinguish between how “being” female and “appearing to be” female are related to economic outcomes.

We found that in EVE Online, using a female character was associated with slightly less virtual wealth, while in EverQuest II, using a female character was associated with being richer on average. Since in our study the players chose the characters themselves instead of being assigned characters at random, we don’t know what the causal relationship between character gender and wealth in these games was, if any. But it’s interesting to note that again the results differed completely between games, suggesting that while gender does matter, its effect has more to do with the mutable “software” of the players and/or the coded environments rather than our immutable “hardware”.

Ed: The dataset you worked with could be considered to be an example of ‘big data’ (ie you had full transactional trace data people interacting in two games) — what can you discover with this sort of data (as opposed to eg user surveys, participant observation, or ethnographies); and how useful or powerful is it?

Vili: Social researchers are used to working with small samples of data, and then looking at measures of statistical significance to assess whether the findings are generalizable to the overall population or whether they’re just a fluke. This focus on statistical significance is sometimes so extreme that people forget to consider the practical significance of the findings: even if the effect is real, is it big enough to make any difference in practice? In contrast, when you are working with big data, almost any relationship is statistically significant, so that becomes irrelevant. As a result, people learn to focus more on practical significance — researchers, peer reviewers, journal editors, funders, as well as the general public. This is a good thing, because it can increase the impact that social research has in society.

In this study, we spent a lot of time thinking about the practical significance of the findings. In any national economy, a 11 percent gap between men and women would be huge. But in virtual economies, overall wealth inequality tends to be orders of magnitude greater than in national economies, so that a 11 percent gap is in fact relatively minuscule. Other factors, like whether one is a casual participant in the economy or a semi-professional, have a much bigger effect, so much so that I’m not sure if participants notice a gender gap themselves. Thus one of the key conclusions of the study was that we also need to look beyond traditional sociodemographic categories like gender to see what new social divisions may be appearing in virtual economies.

Ed: What do you think are the hot topics and future directions in research (and policy) on virtual economies, gaming, microwork, crowd-sourcing etc.?

Vili: Previously, ICT adoption resulted in some people’s jobs being eliminated and others being enhanced. This shift had uneven impacts on men’s and women’s jobs. Today, we are seeing an Internet-fuelled “volunterization” of some types of work — moving the work from paid employees and contractors to crowds and fans compensated with points, likes, and badges rather than money. Social researchers should keep track of how this shift impacts different social categories like men and women: whose work ends up being compensated in play money, and who gets to keep the conventional rewards.

Read the full article: Lehdonvirta, V., Ratan, R. A., Kennedy, T. L., and Williams, D. (2014) Pink and Blue Pixel$: Gender and Economic Disparity in Two Massive Online Games. The Information Society 30 (4) 243-255.

Vili Lehdonvirta is a Research Fellow and DPhil Programme Director at the Oxford Internet Institute, and an editor of the Policy & Internet journal. He is an economic sociologist who studies the social and economic dimensions of new information technologies around the world, with particular expertise in digital markets and crowdsourcing.

Vili Lehdonvirta was talking to blog editor David Sutcliffe.

What are the limitations of learning at scale? Investigating information diffusion and network vulnerability in MOOCs

Millions of people worldwide are currently enrolled in courses provided on large-scale learning platforms (aka ‘MOOCs’), typically collaborating in online discussion forums with thousands of peers. Current learning theory emphasizes the importance of this group interaction for cognition. However, while a lot is known about the mechanics of group learning in smaller and traditionally organized online classrooms, fewer studies have examined participant interactions when learning “at scale”. Some studies have used clickstream data to trace participant behaviour; even predicting dropouts based on their engagement patterns. However, many questions remain about the characteristics of group interactions in these courses, highlighting the need to understand whether — and how — MOOCs allow for deep and meaningful learning by facilitating significant interactions.

But what constitutes a “significant” learning interaction? In large-scale MOOC forums, with socio-culturally diverse learners with different motivations for participating, this is a non-trivial problem. MOOCs are best defined as “non-formal” learning spaces, where learners pick and choose how (and if) they interact. This kind of group membership, together with the short-term nature of these courses, means that relatively weak inter-personal relationships are likely. Many of the tens of thousands of interactions in the forum may have little relevance to the learning process. So can we actually define the underlying network of significant interactions? Only once we have done this can we explore firstly how information flows through the forums, and secondly the robustness of those interaction networks: in short, the effectiveness of the platform design for supporting group learning at scale.

To explore these questions, we analysed data from 167,000 students registered on two business MOOCs offered on the Coursera platform. Almost 8000 students contributed around 30,000 discussion posts over the six weeks of the courses; almost 30,000 students viewed at least one discussion thread, totalling 321,769 discussion thread views. We first modelled these communications as a social network, with nodes representing students who posted in the discussion forums, and edges (ie links) indicating co-participation in at least one discussion thread. Of course, not all links will be equally important: many exchanges will be trivial (‘hello’, ‘thanks’ etc.). Our task, then, was to derive a “true” network of meaningful student interactions (ie iterative, consistent dialogue) by filtering out those links generated by random encounters (Figure 1; see also full paper for methodology).

Figure 1. Comparison of observed (a; ‘all interactions’) and filtered (b; ‘significant interactions’) communication networks for a MOOC forum. Filtering affects network properties such as modularity score (ie degree of clustering). Colours correspond to the automatically detected interest communities.
One feature of networks that has been studied in many disciplines is their vulnerability to fragmentation when nodes are removed (the Internet, for example, emerged from US Army research aiming to develop a disruption-resistant network for critical communications). While we aren’t interested in the effect of missile strike on MOOC exchanges, from an educational perspective it is still useful to ask which “critical set” of learners is mostly responsible for information flow in a communication network — and what would happen to online discussions if these learners were removed. To our knowledge, this is the first time vulnerability of communication networks has been explored in an educational setting.

Network vulnerability is interesting because it indicates how integrated and inclusive the communication flow is. Discussion forums with fleeting participation will have only a very few vocal participants: removing these people from the network will markedly reduce the information flow between the other participants — as the network falls apart, it simply becomes more difficult for information to travel across it via linked nodes. Conversely, forums that encourage repeated engagement and in-depth discussion among participants will have a larger ‘critical set’, with discussion distributed across a wide range of learners.

To understand the structure of group communication in the two courses, we looked at how quickly our modelled communication network fell apart when: (a) the most central nodes were iteratively disconnected (Figure 2; blue), compared with when (b) nodes were removed at random (ie the ‘neutral’ case; green). In the random case, the network degrades evenly, as expected. When we selectively remove the most central nodes, however, we see rapid disintegration: indicating the presence of individuals who are acting as important ‘bridges’ across the network. In other words, the network of student interactions is not random: it has structure.

Figure 2. Rapid network degradation results from removal of central nodes (blue). This indicates the presence of individuals acting as ‘bridges’ between sub-groups. Removing these bridges results in rapid degradation of the overall network. Removal of random nodes (green) results in a more gradual degradation.
Figure 2. Rapid network degradation results from removal of central nodes (blue). This indicates the presence of individuals acting as ‘bridges’ between sub-groups. Removing these bridges results in rapid degradation of the overall network. Removal of random nodes (green) results in a more gradual degradation.

Of course, the structure of participant interactions will reflect the purpose and design of the particular forum. We can see from Figure 3 that different forums in the courses have different vulnerability thresholds. Forums with high levels of iterative dialogue and knowledge construction — with learners sharing ideas and insights about weekly questions, strategic analyses, or course outcomes — are the least vulnerable to degradation. A relatively high proportion of nodes have to be removed before the network falls apart (rightmost-blue line). Forums where most individuals post once to introduce themselves and then move their discussions to other platforms (such as Facebook) or cease engagement altogether tend to be more vulnerable to degradation (left-most blue line). The different vulnerability thresholds suggest that different topics (and forum functions) promote different levels of forum engagement. Certainly, asking students open-ended questions tended to encourage significant discussions, leading to greater engagement and knowledge construction as they read analyses posted by their peers and commented with additional insights or critiques.

Figure 3 – Network vulnerabilities of different course forums.
Figure 3 – Network vulnerabilities of different course forums.

Understanding something about the vulnerability of a communication or interaction network is important, because it will tend to affect how information spreads across it. To investigate this, we simulated an information diffusion model similar to that used to model social contagion. Although simplistic, the SI model (‘susceptible-infected’) is very useful in analyzing topological and temporal effects on networked communication systems. While the model doesn’t account for things like decaying interest over time or peer influence, it allows us to compare the efficiency of different network topologies.

We compared our (real-data) network model with a randomized network in order to see how well information would flow if the community structures we observed in Figure 2 did not exist. Figure 4 shows the number of ‘infected’ (or ‘reached’) nodes over time for both the real (solid lines) and randomized networks (dashed lines). In all the forums, we can see that information actually spreads faster in the randomised networks. This is explained by the existence of local community structures in the real-world networks: networks with dense clusters of nodes (i.e. a clumpy network) will result in slower diffusion than a network with a more even distribution of communication, where participants do not tend to favor discussions with a limited cohort of their peers.

Figure 4 (a) shows the percentage of infected nodes vs. simulation time for different networks. The solid lines show the results for the original network and the dashed lines for the random networks. (b) shows the time it took for a simulated “information packet” to come into contact with half the network’s nodes.
Figure 4 (a) shows the percentage of infected nodes vs. simulation time for different networks. The solid lines show the results for the original network and the dashed lines for the random networks. (b) shows the time it took for a simulated “information packet” to come into contact with half the network’s nodes.

Overall, these results reveal an important characteristic of student discussion in MOOCs: when it comes to significant communication between learners, there are simply too many discussion topics and too much heterogeneity (ie clumpiness) to result in truly global-scale discussion. Instead, most information exchange, and by extension, any knowledge construction in the discussion forums occurs in small, short-lived groups: with information “trapped” in small learner groups. This finding is important as it highlights structural limitations that may impact the ability of MOOCs to facilitate communication amongst learners that look to learn “in the crowd”.

These insights into the communication dynamics motivate a number of important questions about how social learning can be better supported, and facilitated, in MOOCs. They certainly suggest the need to leverage intelligent machine learning algorithms to support the needs of crowd-based learners; for example, in detecting different types of discussion and patterns of engagement during the runtime of a course to help students identify and engage in conversations that promote individualized learning. Without such interventions the current structural limitations of social learning in MOOCs may prevent the realization of a truly global classroom.

The next post addresses qualitative content analysis and how machine-learning community detection schemes can be used to infer latent learner communities from the content of forum posts.

Read the full paper: Gillani, N., Yasseri, T., Eynon, R., and Hjorth, I. (2014) Structural limitations of learning in a crowd – communication vulnerability and information diffusion in MOOCs. Scientific Reports 4.

Rebecca Eynon holds a joint academic post between the Oxford Internet Institute (OII) and the Department of Education at the University of Oxford. Her research focuses on education, learning and inequalities, and she has carried out projects in a range of settings (higher education, schools and the home) and life stages (childhood, adolescence and late adulthood).

The life and death of political news: using online data to measure the impact of the audience agenda

Image of the Telegraph’s state of the art “hub and spoke” newsroom layout by David Sim.
The political agenda has always been shaped by what the news media decide to publish — through their ability to broadcast to large, loyal audiences in a sustained manner, news editors have the ability to shape ‘political reality’ by deciding what is important to report. Traditionally, journalists pass to their editors from a pool of potential stories; editors then choose which stories to publish. However, with the increasing importance of online news, editors must now decide not only what to publish and where, but how long it should remain prominent and visible to the audience on the front page of the news website.

The question of how much influence the audience has in these decisions has always been ambiguous. While in theory we might expect journalists to be attentive to readers, journalism has also been characterized as a profession with a “deliberate…ignorance of audience wants” (Anderson, 2011b). This ‘anti-populism’ is still often portrayed as an important journalistic virtue, in the context of telling people what they need to hear, rather than what they want to hear. Recently, however, attention has been turning to the potential impact that online audience metrics are having on journalism’s “deliberate ignorance”. Online publishing provides a huge amount of information to editors about visitor numbers, visit frequency, and what visitors choose to read and how long they spend reading it. Online editors now have detailed information about what articles are popular almost as soon as they are published, with these statistics frequently displayed prominently in the newsroom.

The rise of audience metrics has created concern both within the journalistic profession and academia, as part of a broader set of concerns about the way journalism is changing online. Many have expressed concern about a ‘culture of click’, whereby important but unexciting stories make way for more attention grabbing pieces, and editorial judgments are overridden by traffic statistics. At a time when media business models are under great strain, the incentives to follow the audience are obvious, particularly when business models increasingly rely on revenue from online traffic and advertising. The consequences for the broader agenda-setting function of the news media could be significant: more prolific or earlier readers might play a disproportionate role in helping to select content; particular social classes or groupings that read news online less frequently might find their issues being subtly shifted down the agenda.

The extent to which such a populist influence exists has attracted little empirical research. Many ethnographic studies have shown that audience metrics are being captured in online newsrooms, with anecdotal evidence for the importance of traffic statistics on an article’s lifetime (Anderson 2011b, MacGregor, 2007). However, many editors have emphasised that popularity is not a major determining factor (MacGregor, 2007), and that news values remain significant in terms of placement of news articles.

In order to assess the possible influence of audience metrics on decisions made by political news editors, we undertook a systematic, large-scale study of the relationship between readership statistics and article lifetime. We examined the news cycles of five major UK news outlets (the BBC, the Daily Telegraph, the Guardian, the Daily Mail and the Mirror) over a period of six weeks, capturing their front pages every 15 minutes, resulting in over 20,000 front-page captures and more than 40,000 individual articles. We measure article readership by capturing information from the BBC’s “most read” list of news articles (twelve percent of the articles were featured at some point on the ‘most read’ list, with a median time to achieving this status of two hours, and an average article life of 15 hours on the front page). Using the Cox Proportional Hazards model (which allows us to quantify the impact of an article’s appearance on the ‘most read’ list on its chance of survival) we asked whether an article’s being listed in a ‘most read’ column affected the length of time it remained on the front page.

We found that ‘most read’ articles had, on average, a 26% lower chance of being removed from the front page than equivalent articles which were not on the most read list, providing support for the idea that online editors are influenced by readership statistics. In addition to assessing the general impact of readership statistics, we also wanted to see whether this effect differs between ‘political’ and ‘entertainment’ news. Research on participatory journalism has suggested that online editors might be more willing to allow audience participation in areas of soft news such as entertainment, arts, sports, etc. We find a small amount of evidence for this claim, though the difference between the two categories was very slight.

Finally, we wanted to assess whether there is a ‘quality’ / ‘tabloid’ split. Part of the definition of tabloid style journalism lies precisely in its willingness to follow the demands of its audience. However, we found the audience ‘effect’ (surprisingly) to be most obvious in the quality papers. For tabloids, ‘most read’ status actually had a slightly negative effect on article lifetime. We wouldn’t argue that tabloid editors actively reject the wishes of their audience; however we can say that these editors are no more likely to follow their audience than the typical ‘quality’ editor, and in fact may be less so. We do not have a clear explanation for this difference, though we could speculate that, as tabloid publications are already more tuned in to the wishes of their audience, the appearance of readership statistics makes less practical difference to the overall product. However it may also simply be the case that the online environment is slowly producing new journalistic practices for which the tabloid / quality distinction will be of less usefulness.

So on the basis of our study, we can say that high-traffic articles do in fact spend longer in the spotlight than ones that attract less readership: audience readership does have a measurable impact on the lifespan of political news. The audience is no longer the unknown quantity it was in offline journalism: it appears to have a clear impact on journalistic practice. The question that remains, however, is whether this constitutes evidence of a new ‘populism’ in journalism; or whether it represents (as editors themselves have argued) the simple striking of a balance between audience demands and news values.

Read the full article: Bright, J., and Nicholls, T. (2014) The Life and Death of Political News: Measuring the Impact of the Audience Agenda Using Online Data. Social Science Computer Review 32 (2) 170-181.


Anderson, C. W. (2011) Between creative and quantified audiences: Web metrics and changing patterns of newswork in local US newsrooms. Journalism 12 (5) 550-566.

MacGregor, P. (2007) Tracking the Online Audience. Journalism Studies 8 (2) 280-298.

OII Resarch Fellow Jonathan Bright is a political scientist specialising in computational and ‘big data’ approaches to the social sciences. His major interest concerns studying how people get information about the political process, and how this is changing in the internet era.

Tom Nicholls is a doctoral student at the Oxford Internet Institute. His research interests include the impact of technology on citizen/government relationships, the Internet’s implications for public management and models of electronic public service delivery.