Methods

The Middle East and North Africa are relatively under-represented in Wikipedia. Even after accounting for factors like population, Internet access, and literacy, we still see less contact than would be expected.

Editors from all over the world have played some part in writing about Egypt; in fact, only 13% of all edits actually originate in the country (38% are from the US). More: Who edits Wikipedia? by Mark Graham. Ed: In basic terms, what patterns of ‘information geography’ are you seeing in the region? Mark: The first pattern that we see is that the Middle East and North Africa are relatively under-represented in Wikipedia. Even after accounting for factors like population, Internet access, and literacy, we still see less contact than would be expected. Second, of the content that exists, a lot of it is in European and French rather than in Arabic (or Farsi or Hebrew). In other words, there is even less in local languages. And finally, if we look at contributions (or edits), not only do we also see a relatively small number of edits originating in the region, but many of those edits are being used to write about other parts of the word rather than their own region. What this broadly seems to suggest is that the participatory potentials of Wikipedia aren’t yet being harnessed in order to even out the differences between the world’s informational cores and peripheries. Ed: How closely do these online patterns in representation correlate with regional (offline) patterns in income, education, language, access to technology (etc.) Can you map one to the other? Mark: Population and broadband availability alone explain a lot of the variance that we see. Other factors like income and education also play a role, but it is population and broadband that have the greatest explanatory power here. Interestingly, it is most countries in the MENA region that fail to fit well to those predictors. Ed: How much do you think these patterns result from the systematic imposition of a particular view point—such as official editorial policies—as opposed to the (emergent) outcome of lots of users and editors…

The more that differing points of view and differing evaluative frames came into contact, the more the community worked together to generate rules and norms to regulate and improve the production of articles.

Image from "The Iraq War: A Historiography of Wikipedia Changelogs," a twelve-volume set of all changes to the Wikipedia article on the Iraq War (totalling over 12,000 changes and almost 7,000 pages), by STML.

Ed: I really like the way that, contrary to many current studies on conflict and Wikipedia, you focus on how conflict can actually be quite productive. How did this insight emerge? Kim: I was initially looking for instances of collaboration in Wikipedia to see how popular debates about peer production played out in reality. What I found was that conflict was significantly more prevalent than I had assumed. It struck me as interesting, as most of the popular debates at the time framed conflict as hindering the collaborative editorial process. After several stages of coding, I found that the conversations that involved even a minor degree of conflict were fascinating. A pattern emerged where disagreements about the editorial process resulted in community members taking positive actions to solve the discord and achieve consensus. This was especially prominent in early discussions prior to 2005 before many of the policies that regulate content production in the encyclopaedia were formulated. The more that differing points of view and differing evaluative frames came into contact, the more the community worked together to generate rules and norms to regulate and improve the production of articles. Ed: You use David Stark’s concept of generative friction to describe how conflict is ‘central to the editorial processes of Wikipedia’. Can you explain why this is important? Kim: Having different points of view come into contact is the premise of Wikipedia’s collaborative editing model. When these views meet, Stark maintains there is an overlap of individuals’ evaluative frames, or worldviews, and it is in this overlap that creative solutions to problems can occur. People come across solutions they may not otherwise have encountered in the typical homogeneous, hierarchical system that is traditionally the standard for institutions trying to maximise efficiency. In this respect, conflict is central to the process as it is about the struggle to negotiate meaning and achieve a consensus among editors with differing opinions and perspectives.…

Social media monitoring, which in theory can extract information from tweets and Facebook posts and quantify positive and negative public reactions to people, policies and events has an obvious utility for politicians seeking office.

GOP presidential nominee Mitt Romney, centre, waving to crowd, after delivering his acceptance speech on the final night of the 2012 Republican National Convention. Image by NewsHour.

Recently, there has been a lot of interest in the potential of social media as a means to understand public opinion. Driven by an interest in the potential of so-called “big data”, this development has been fuelled by a number of trends. Governments have been keen to create techniques for what they term “horizon scanning”, which broadly means searching for the indications of emerging crises (such as runs on banks or emerging natural disasters) online, and reacting before the problem really develops. Governments around the world are already committing massive resources to developing these techniques. In the private sector, big companies’ interest in brand management has fitted neatly with the potential of social media monitoring. A number of specialised consultancies now claim to be able to monitor and quantify reactions to products, interactions or bad publicity in real time. It should therefore come as little surprise that, like other research methods before, these new techniques are now crossing over into the competitive political space. Social media monitoring, which in theory can extract information from tweets and Facebook posts and quantify positive and negative public reactions to people, policies and events has an obvious utility for politicians seeking office. Broadly, the process works like this: vast datasets relating to an election, often running into millions of items, are gathered from social media sites such as Twitter. These data are then analysed using natural language processing software, which automatically identifies qualities relating to candidates or policies and attributes a positive or negative sentiment to each item. Finally, these sentiments and other properties mined from the text are totalised, to produce an overall figure for public reaction on social media. These techniques have already been employed by the mainstream media to report on the 2010 British general election (when the country had its first leaders debate, an event ripe for this kind of research) and also in the 2012 US presidential election. This…

There are massive inequalities that cannot simply be explained by uneven Internet penetration. A range of other physical, social, political and economic barriers are reinforcing this digital divide.

Images are an important form of knowledge that allow us to develop understandings about our world; the global geographic distribution of geotagged images on Flickr reveals the density of visual representations and locally depicted knowledge of all places on our planet. Map by M.Graham, M.Stephens, S.Hale.

Information is the raw material for much of the work that goes on in the contemporary global economy, and visibility and voice in this information ecosystem is a prerequisite for influence and control. As Hand and Sandywell (2002: 199) have argued, “digitalised knowledge and its electronic media are indeed synonymous with power.” As such, it is important to understand who produces and reproduces information, who has access to it, and who and where are represented by it. Traditionally, information and knowledge about the world have been geographically constrained. The transmission of information required either the movement of people or the availability of some other medium of communication. However, up until the late 20th century, almost all mediums of information—books, newspapers, academic journals, patents and the like—were characterised by huge geographic inequalities. The global north produced, consumed and controlled much of the world’s codified knowledge, while the global south was largely left out. Today, the movement of information is, in theory, rarely constrained by distance. Very few parts of the world remain disconnected from the grid, and over 2 billion people are now online (most of them in the Global South). Unsurprisingly, many believe we now have the potential to access what Wikipedia’s founder Jimmy Wales refers to as “the sum of all human knowledge”. Theoretically, parts of the world that have been left out of flows and representations of knowledge can be quite literally put back on the map. However, “potential” has too often been confused with actual practice, and stark digital divisions of labour are still evident in all open platforms that rely on user-generated content. Google Map’s databases contain more indexed user-generated content about the Tokyo metropolitan region than the entire continent of Africa. On Wikipedia, there is more written about Germany than about South America and Africa combined. In other words, there are massive inequalities that cannot simply be explained by uneven Internet penetration. A range of…

The new networks of political protest, which harness these new online technologies are often described in theoretical terms as being ‘fluid’ and ‘horizontal’, in contrast to the rigid and hierarchical structure of earlier protest organisation.

How have online technologies reconfigured collective action? It is often assumed that the rise of social networking tools, accompanied by the mass adoption of mobile devices, have strengthened the impact and broadened the reach of today’s political protests. Enabling massive self-communication allows protesters to write their own interpretation of events—free from a mass media often seen as adversarial—and emerging protests may also benefit from the cheaper, faster transmission of information and more effective mobilisation made possible by online tools such as Twitter. The new networks of political protest, which harness these new online technologies are often described in theoretical terms as being ‘fluid’ and ‘horizontal’, in contrast to the rigid and hierarchical structure of earlier protest organisation. Yet such theoretical assumptions have seldom been tested empirically. This new language of networks may be useful as a shorthand to describe protest dynamics, but does it accurately reflect how protest networks mediate communication and coordinate support? The global protests against austerity and inequality which took place on May 12, 2012 provide an interesting case study to test the structure and strength of a transnational online protest movement. The ‘indignados’ movement emerged as a response to the Spanish government’s politics of austerity in the aftermath of the global financial crisis. The movement flared in May 2011, when hundreds of thousands of protesters marched in Spanish cities, and many set up camps ahead of municipal elections a week later. These protests contributed to the emergence of the worldwide Occupy movement. After the original plan to occupy New York City’s financial district mobilised thousands of protesters in September 2011, the movement spread to other cities in the US and worldwide, including London and Frankfurt, before winding down as the camp sites were dismantled weeks later. Interest in these movements was revived, however, as the first anniversary of the ‘indignados’ protests approached in May 2012. To test whether the fluidity, horizontality and connectivity often claimed for…

Government agencies are rarely completely transparent, often do not provide clear instructions for accessing the information they store, seldom use standardised norms, and can overlook user needs.

A view inside the House chamber of the Utah State Legislature. Image by deltaMike.

Public demands for transparency in the political process have long been a central feature of American democracy, and recent technological improvements have considerably facilitated the ability of state governments to respond to such public pressures. With online legislative archives, state legislatures can make available a large number of public documents. In addition to meeting the demands of interest groups, activists, and the public at large, these websites enable researchers to conduct single-state studies, cross-state comparisons, and longitudinal analysis. While online legislative archives are, in theory, rich sources of information that save researchers valuable time as they gather data across the states, in practice, government agencies are rarely completely transparent, often do not provide clear instructions for accessing the information they store, seldom use standardised norms, and can overlook user needs. These obstacles to state politics research are longstanding: Malcolm Jewell noted almost three decades ago the need for “a much more comprehensive and systematic collection and analysis of comparative state political data.” While the growing availability of online legislative resources helps to address the first problem of collection, the limitations of search and retrieval functions remind us that the latter remains a challenge. The fifty state legislative websites are quite different; few of them are intuitive or adequately transparent, and there is no standardised or systematic process to retrieve data. For many states, it is not possible to identify issue-specific bills that are introduced and/or passed during a specific period of time, let alone the sponsors or committees, without reading the full text of each bill. For researchers who are interested in certain time periods, policy areas, committees, or sponsors, the inability to set filters or immediately see relevant results limits their ability to efficiently collect data. Frustrated by the obstacles we faced in undertaking a study of state-level immigration legislation before and after September 11, 2001, we decided to instead  evaluate each state legislative website—a “state of the states” analysis—to…

Very few of these experiments use manipulation of information environments on the internet as a way to change people’s behaviour. The Internet seems to hold enormous promise for ‘Nudging’ by redesigning ‘choice environments’.

What makes people join political actions? Iraq War protesters crowd Trafalgar Square in February 2007. Image by DavidMartynHunt.

Experiments—or more technically, Randomised Control Trials—are the most exciting thing on the UK public policy horizon. In 2010, the incoming Coalition Government set up the Behavioural Insights Team in the Cabinet Office to find innovative and cost effective (cheap) ways to change people’s behaviour. Since then the team have run a number of exciting experiments with remarkable success, particularly in terms of encouraging organ donation and timely payment of taxes. With Bad Science author Ben Goldacre, they have now published a Guide to RCTs, and plenty more experiments are planned. This sudden enthusiasm for experiments in the UK government is very exciting. The Behavioural Insights Team is the first of its kind in the world—in the US, there are few experiments at federal level, although there have been a few well publicised ones at local level—and the UK government has always been rather scared of the concept before, there being a number of cultural barriers to the very word ‘experiment’ in British government. Experiments came to the fore in the previous Administration’s Mindscape document. But what made them popular for Public Policy may well have been the 2008 book Nudge by Thaler and Sunstein, which shows that by knowing how people think, it is possible to design choice environments that make it “easier for people to choose what is best for themselves, their families, and their society.” Since then, the political scientist Peter John has published ‘Nudge, Nudge, Think, Think, which has received positive coverage in The Economist: The use of behavioural economics in public policy shows promise and the Financial Times: Nudge, nudge. Think, think. Say no more …; and has been reviewed by the LSE Review of Books: Nudge, Nudge, Think, Think: experimenting with ways to change civic behaviour. But there is one thing missing here. Very few of these experiments use manipulation of information environments on the internet as a way to change people’s behaviour. The Internet…

Despite large investments of law enforcement resources, online child exploitation is nowhere near under control, and while there are numerous technological products to aid this, they still require substantial human intervention.

The Internet has provided the social, individual, and technological circumstances needed for child pornography to flourish. Sex offenders have been able to utilise the Internet for dissemination of child pornographic content, for social networking with other pedophiles through chatrooms and newsgroups, and for sexual communication with children. A 2009 estimate by the United Nations estimates that there are more than four million websites containing child pornography, with 35 percent of them depicting serious sexual assault [1]. Even if this report or others exaggerate the true prevalence of those websites by a wide margin, the fact of the matter is that those websites are pervasive on the world wide web. Despite large investments of law enforcement resources, online child exploitation is nowhere near under control, and while there are numerous technological products to aid in finding child pornography online, they still require substantial human intervention. Despite this, steps can be taken to increase the automation process of these searches, to reduce the amount of content police officers have to examine, and increase the time they can spend on investigating individuals. While law enforcement agencies will aim for maximum disruption of online child exploitation networks by targeting the most connected players, there is a general lack of research on the structural nature of these networks; something we aimed to address in our study, by developing a method to extract child exploitation networks, map their structure, and analyse their content. Our custom-written Child Exploitation Network Extractor (CENE) automatically crawls the Web from a user-specified seed page, collecting information about the pages it visits by recursively following the links out of the page; the result of the crawl is a network structure containing information about the content of the websites, and the linkages between them [2]. We chose ten websites as starting points for the crawls; four were selected from a list of known child pornography websites while the other six were selected and…

Small changes in individual actions can have large effects at the aggregate level; this opens up the potential for drawing incorrect conclusions about generative mechanisms when only aggregated patterns are analysed.

One of the big social science questions is how our individual actions aggregate into collective patterns of behaviour (think crowds, riots, and revolutions). This question has so far been difficult to tackle due to a lack of appropriate data, and the complexity of the relationship between the individual and the collective. Digital trails are allowing Social Scientists to understand this relationship better. Small changes in individual actions can have large effects at the aggregate level; this opens up the potential for drawing incorrect conclusions about generative mechanisms when only aggregated patterns are analysed, as Schelling aimed to show in his classic example of racial segregation. Part of the reason why it has been so difficult to explore this connection between the individual and the collective—and the unintended consequences that arise from that connection—is lack of proper empirical data, particularly around the structure of interdependence that links individual actions. This relational information is what digital data is now providing; however, they present some new challenges to the social scientist, particularly those who are used to working with smaller, cross-sectional datasets. Suddenly, we can track and analyse the interactions of thousands (if not millions) of people with a time resolution that can go down to the second. The question is how to best aggregate that data and deal with the time dimension. Interactions take place in continuous time; however, most digital interactions are recorded as events (i.e. sending or receiving messages), and different network structures emerge when those events are aggregated according to different windows (i.e. days, weeks, months). We still don’t have systematic knowledge on how transforming continuous data into discrete observation windows affects the networks of interaction we analyse. Reconstructing interpersonal networks (particularly longitudinal network data) used to be extremely time consuming and difficult; now it is relatively easy to obtain that sort of network data, but modelling and analysing them is still a challenge. Another problem faced by social…