Articles

Widespread use of digital technologies, the Internet and social media means both citizens and governments leave digital traces that can be harvested to generate big data.

The environment in which public policy is made has entered a period of dramatic change. Widespread use of digital technologies, the Internet and social media means both citizens and governments leave digital traces that can be harvested to generate big data. Policy-making takes place in an increasingly rich data environment, which poses both promises and threats to policy-makers. On the promise side, such data offers a chance for policy-making and implementation to be more citizen-focused, taking account of citizens’ needs, preferences and actual experience of public services, as recorded on social media platforms. As citizens express policy opinions on social networking sites such as Twitter and Facebook; rate or rank services or agencies on government applications such as NHS Choices; or enter discussions on the burgeoning range of social enterprise and NGO sites, such as Mumsnet, 38 degrees and patientopinion.org, they generate a whole range of data that government agencies might harvest to good use. Policy-makers also have access to a huge range of data on citizens’ actual behaviour, as recorded digitally whenever citizens interact with government administration or undertake some act of civic engagement, such as signing a petition. Data mined from social media or administrative operations in this way also provide a range of new data which can enable government agencies to monitor—and improve—their own performance, for example through log usage data of their own electronic presence or transactions recorded on internal information systems, which are increasingly interlinked. And they can use data from social media for self-improvement, by understanding what people are saying about government, and which policies, services or providers are attracting negative opinions and complaints, enabling identification of a failing school, hospital or contractor, for example. They can solicit such data via their own sites, or those of social enterprises. And they can find out what people are concerned about or looking for, from the Google Search API or Google trends, which record the search…

There has been a major shift in the policies of governments concerning participatory governance—that is, engaged, collaborative, and community-focused public policy.

Policy makers today must contend with two inescapable phenomena. On the one hand, there has been a major shift in the policies of governments concerning participatory governance—that is, engaged, collaborative, and community-focused public policy. At the same time, a significant proportion of government activities have now moved online, bringing about “a change to the whole information environment within which government operates” (Margetts 2009, 6). Indeed, the Internet has become the main medium of interaction between government and citizens, and numerous websites offer opportunities for online democratic participation. The Hansard Society, for instance, regularly runs e-consultations on behalf of UK parliamentary select committees. For examples, e-consultations have been run on the Climate Change Bill (2007), the Human Tissue and Embryo Bill (2007), and on domestic violence and forced marriage (2008). Councils and boroughs also regularly invite citizens to take part in online consultations on issues affecting their area. The London Borough of Hammersmith and Fulham, for example, recently asked its residents for thier views on Sex Entertainment Venues and Sex Establishment Licensing policy. However, citizen participation poses certain challenges for the design and analysis of public policy. In particular, governments and organisations must demonstrate that all opinions expressed through participatory exercises have been duly considered and carefully weighted before decisions are reached. One method for partly automating the interpretation of large quantities of online content typically produced by public consultations is text mining. Software products currently available range from those primarily used in qualitative research (integrating functions like tagging, indexing, and classification), to those integrating more quantitative and statistical tools, such as word frequency and cluster analysis (more information on text mining tools can be found at the National Centre for Text Mining). While these methods have certainly attracted criticism and skepticism in terms of the interpretability of the output, they offer four important advantages for the analyst: namely categorisation, data reduction, visualisation, and speed. 1. Categorisation. When analysing the results…

While traditional surveillance systems will remain the pillars of public health, online media monitoring has added an important early-warning function, with social media bringing additional benefits to epidemic intelligence.

Communication of risk in any public health emergency is a complex task for healthcare agencies; a task made more challenging when citizens are bombarded with online information. Mexico City, 2009. Image by Eneas.

Ed: Could you briefly outline your study? Patty: We investigated the role of Twitter during the 2009 swine flu pandemics from two perspectives. Firstly, we demonstrated the role of the social network to detect an upcoming spike in an epidemic before the official surveillance systems—up to week in the UK and up to 2-3 weeks in the US—by investigating users who “self-diagnosed” themselves posting tweets such as “I have flu/swine flu.” Secondly, we illustrated how online resources reporting the WHO declaration of “pandemics” on 11 June 2009 were propagated through Twitter during the 24 hours after the official announcement [1,2,3]. Ed: Disease control agencies already routinely follow media sources; are public health agencies  aware of social media as another valuable source of information? Patty:  Social media are providing an invaluable real-time data signal complementing well-established epidemic intelligence (EI) systems monitoring online media, such as MedISys and GPHIN. While traditional surveillance systems will remain the pillars of public health, online media monitoring has added an important early-warning function, with social media bringing additional benefits to epidemic intelligence: virtually real-time information available in the public domain that is contributed by users themselves, thus not relying on the editorial policies of media agencies. Public health agencies (such as the European Centre for Disease Prevention and Control) are interested in social media early warning systems, but more research is required to develop robust social media monitoring solutions that are ready to be integrated with agencies’ EI services. Ed: How difficult is this data to process? E.g.: is this a full sample, processed in real-time? Patty:  No, obtaining all Twitter search query results is not possible. In our 2009 pilot study we were accessing data from Twitter using a search API interface querying the database every minute (the number of results was limited to 100 tweets). Currently, only 1% of the ‘Firehose’ (massive real-time stream of all public tweets) is made available using the streaming API. The searches have…

The Middle East and North Africa are relatively under-represented in Wikipedia. Even after accounting for factors like population, Internet access, and literacy, we still see less contact than would be expected.

Editors from all over the world have played some part in writing about Egypt; in fact, only 13% of all edits actually originate in the country (38% are from the US). More: Who edits Wikipedia? by Mark Graham. Ed: In basic terms, what patterns of ‘information geography’ are you seeing in the region? Mark: The first pattern that we see is that the Middle East and North Africa are relatively under-represented in Wikipedia. Even after accounting for factors like population, Internet access, and literacy, we still see less contact than would be expected. Second, of the content that exists, a lot of it is in European and French rather than in Arabic (or Farsi or Hebrew). In other words, there is even less in local languages. And finally, if we look at contributions (or edits), not only do we also see a relatively small number of edits originating in the region, but many of those edits are being used to write about other parts of the word rather than their own region. What this broadly seems to suggest is that the participatory potentials of Wikipedia aren’t yet being harnessed in order to even out the differences between the world’s informational cores and peripheries. Ed: How closely do these online patterns in representation correlate with regional (offline) patterns in income, education, language, access to technology (etc.) Can you map one to the other? Mark: Population and broadband availability alone explain a lot of the variance that we see. Other factors like income and education also play a role, but it is population and broadband that have the greatest explanatory power here. Interestingly, it is most countries in the MENA region that fail to fit well to those predictors. Ed: How much do you think these patterns result from the systematic imposition of a particular view point—such as official editorial policies—as opposed to the (emergent) outcome of lots of users and editors…

Bringing together leading social science academics with senior government agency staff to discuss its public policy potential.

Last week the OII went to Harvard. Against the backdrop of a gathering storm of interest around the potential of computational social science to contribute to the public good, we sought to bring together leading social science academics with senior government agency staff to discuss its public policy potential. Supported by the OII-edited journal Policy and Internet and its owners, the Washington-based Policy Studies Organization (PSO), this one-day workshop facilitated a thought-provoking conversation between leading big data researchers such as David Lazer, Brooke Foucault-Welles and Sandra Gonzalez-Bailon, e-government experts such as Cary Coglianese, Helen Margetts and Jane Fountain, and senior agency staff from US federal bureaus including Labor Statistics, Census, and the Office for the Management of the Budget. It’s often difficult to appreciate the impact of research beyond the ivory tower, but what this productive workshop demonstrated is that policy-makers and academics share many similar hopes and challenges in relation to the exploitation of ‘big data’. Our motivations and approaches may differ, but insofar as the youth of the ‘big data’ concept explains the lack of common language and understanding, there is value in mutual exploration of the issues. Although it’s impossible to do justice to the richness of the day’s interactions, some of the most pertinent and interesting conversations arose around the following four issues. Managing a diversity of data sources. In a world where our capacity to ask important questions often exceeds the availability of data to answer them, many participants spoke of the difficulties of managing a diversity of data sources. For agency staff this issue comes into sharp focus when available administrative data that is supposed to inform policy formulation is either incomplete or inadequate. Consider, for example, the challenge of regulating an economy in a situation of fundamental data asymmetry, where private sector institutions track, record and analyse every transaction, whilst the state only has access to far more basic performance metrics and accounts.…

Concerns have been expressed about the detrimental role China may play in African media sectors, by increasing authoritarianism and undermining Western efforts to promote openness and freedom of expression.

CAPE TOWNSOUTH AFRICA, 06MAY11 - The Panel during the Future of China-Africa Relations session held at World Economic Forum on Africa 2011 held in Cape Town, South Africa, 4-6 May 2011. Copyright (cc-by-sa) © World Economic Forum (www.weforum.org/Photo Eric Miller emiller@iafrica.com

Ed: Concerns have been expressed (e.g. by Hillary Clinton and David Cameron) about the detrimental role China may play in African media sectors, by increasing authoritarianism and undermining Western efforts to promote openness and freedom of expression. Are these concerns fair? Iginio: China’s initiatives in the communication sector abroad are burdened by the negative record of its domestic media. For the Chinese authorities this is a challenge that does not have an easy solution as they can’t really use their international broadcasters to tell a different story about Chinese media and Chinese engagement with foreign media, because they won’t be trusted. As the linguist George Lakoff has explained, if someone is told “Don’t think of an elephant!” he will likely start “summoning the bulkiness, the grayness, the trunkiness of an elephant”. That is to say, “when we negate a frame, we evoke a frame.” Saying that “Chinese interventions are not increasing authoritarianism” won’t help much. The only path China can undertake is to develop projects and use its media in ways that fall outside the realm of what is expected, creating new associations between China and the media, rather than trying to redress existing ones. In part this is already happening. For example, CCTV Africa, the new initiative of state-owned China’s Central Television (CCTV) and China’s flagship effort to win African hearts and minds, has developed a strategy aimed not at directly offering an alternative image of China, but at advancing new ways of looking at Africa, offering unprecedented resources to African journalists to report from the continent and tapping into the narrative of a “rising Africa,” as a continent of opportunities rather than of hunger, wars and underdevelopment. Ed: Ideology has disappeared from the language of China-Africa cooperation, largely replaced by admissions of China’s interest in Africa’s resources and untapped potential. Does politics (e.g. China wanting to increase its international support and influence) nevertheless still inform the relationship? China’s…

The more that differing points of view and differing evaluative frames came into contact, the more the community worked together to generate rules and norms to regulate and improve the production of articles.

Image from "The Iraq War: A Historiography of Wikipedia Changelogs," a twelve-volume set of all changes to the Wikipedia article on the Iraq War (totalling over 12,000 changes and almost 7,000 pages), by STML.

Ed: I really like the way that, contrary to many current studies on conflict and Wikipedia, you focus on how conflict can actually be quite productive. How did this insight emerge? Kim: I was initially looking for instances of collaboration in Wikipedia to see how popular debates about peer production played out in reality. What I found was that conflict was significantly more prevalent than I had assumed. It struck me as interesting, as most of the popular debates at the time framed conflict as hindering the collaborative editorial process. After several stages of coding, I found that the conversations that involved even a minor degree of conflict were fascinating. A pattern emerged where disagreements about the editorial process resulted in community members taking positive actions to solve the discord and achieve consensus. This was especially prominent in early discussions prior to 2005 before many of the policies that regulate content production in the encyclopaedia were formulated. The more that differing points of view and differing evaluative frames came into contact, the more the community worked together to generate rules and norms to regulate and improve the production of articles. Ed: You use David Stark’s concept of generative friction to describe how conflict is ‘central to the editorial processes of Wikipedia’. Can you explain why this is important? Kim: Having different points of view come into contact is the premise of Wikipedia’s collaborative editing model. When these views meet, Stark maintains there is an overlap of individuals’ evaluative frames, or worldviews, and it is in this overlap that creative solutions to problems can occur. People come across solutions they may not otherwise have encountered in the typical homogeneous, hierarchical system that is traditionally the standard for institutions trying to maximise efficiency. In this respect, conflict is central to the process as it is about the struggle to negotiate meaning and achieve a consensus among editors with differing opinions and perspectives.…

Is censorship of domestic news more geared towards “avoiding panics and maintaining social order”, or just avoiding political embarrassment?

Ed: How much work has been done on censorship of online news in China? What are the methodological challenges and important questions associated with this line of enquiry? Sonya: Recent research is paying much attention to social media and aiming to quantify their censorial practices and to discern common patterns in them. Among these empirical studies, Bamman et al.’s (2012) work claimed to be “the first large-scale analysis of political content censorship” that investigates messages deleted from Sina Weibo, a Chinese equivalent to Twitter. On an even larger scale, King et al. (2013) collected data from nearly 1,400 Chinese social media platforms and analysed the deleted messages. Most studies on news censorship, however, are devoted to narratives of special cases, such as the closure of Freeing Point, an outspoken news and opinion journal, and the blocking of the New York Times after it disclosed the wealth possessed by the family of Chinese former premier Wen Jiabao. The shortage of news censorship research could be attributed to several methodological challenges. First, it is tricky to detect censorship to begin with, given the word ‘censorship’ is one of the first to be censored. Also, news websites will not simply let their readers hit a glaring “404 page not found”. Instead, they will use a “soft 404”, which returns a “success” code for a request of a deleted web page and takes readers to a (different) existing web page. While humans may be able to detect these soft 404s, it will be harder for computer programs (eg run by researchers) to do so. Moreover, because different websites employ varying soft 404 techniques, much labor is required to survey them and to incorporate the acquired knowledge into a generic monitoring tool. Second, high computing power and bandwidth are required to handle the large amount of news publications and the slow network access to Chinese websites. For instance, NetEase alone publishes 8,000 – 10,000 news…

Social media monitoring, which in theory can extract information from tweets and Facebook posts and quantify positive and negative public reactions to people, policies and events has an obvious utility for politicians seeking office.

GOP presidential nominee Mitt Romney, centre, waving to crowd, after delivering his acceptance speech on the final night of the 2012 Republican National Convention. Image by NewsHour.

Recently, there has been a lot of interest in the potential of social media as a means to understand public opinion. Driven by an interest in the potential of so-called “big data”, this development has been fuelled by a number of trends. Governments have been keen to create techniques for what they term “horizon scanning”, which broadly means searching for the indications of emerging crises (such as runs on banks or emerging natural disasters) online, and reacting before the problem really develops. Governments around the world are already committing massive resources to developing these techniques. In the private sector, big companies’ interest in brand management has fitted neatly with the potential of social media monitoring. A number of specialised consultancies now claim to be able to monitor and quantify reactions to products, interactions or bad publicity in real time. It should therefore come as little surprise that, like other research methods before, these new techniques are now crossing over into the competitive political space. Social media monitoring, which in theory can extract information from tweets and Facebook posts and quantify positive and negative public reactions to people, policies and events has an obvious utility for politicians seeking office. Broadly, the process works like this: vast datasets relating to an election, often running into millions of items, are gathered from social media sites such as Twitter. These data are then analysed using natural language processing software, which automatically identifies qualities relating to candidates or policies and attributes a positive or negative sentiment to each item. Finally, these sentiments and other properties mined from the text are totalised, to produce an overall figure for public reaction on social media. These techniques have already been employed by the mainstream media to report on the 2010 British general election (when the country had its first leaders debate, an event ripe for this kind of research) and also in the 2012 US presidential election. This…

Chinese citizens are being encouraged by the government to engage and complain online. Is the Internet just a space to blow off steam, or is it really capable of ‘changing’ Chinese society, as many have assumed?

David: For our research, we surveyed postgraduate students from all over China who had come to Shanghai to study. We asked them five questions to which they provided mostly rather lengthy answers. Despite them being young university students and being very active online, their answers managed to surprise us. Notably, the young Chinese who took part in our research felt very ambiguous about the Internet and its supposed benefits for individual people in China. They appreciated the greater freedom the Internet offered when compared to offline China, but were very wary of others abusing this freedom to their detriment. Ed: In your paper you note that the opinions of many young people closely mirrored those of the government’s statements about the Internet—in what way? David: In 2010 the government published a White Paper on the Internet in China in which they argued that the main uses of the Internet were for obtaining information, and for communicating with others. In contrast to Euro-American discourses around the Internet as a ‘force for democracy,’ the students’ answers to our questions agreed with the evaluation of the government and did not see the Internet as a place to begin organising politically. The main reason for this—in my opinion—is that young Chinese are not used to discussing ‘politics’, and are mostly focused on pursuing the ‘Chinese dream’: good job, large flat or house, nice car, suitable spouse; usually in that order. Ed: The Chinese Internet has usually been discussed in the West as a ‘force for democracy’—leading to the inevitable relinquishing of control by the Chinese Communist Party. Is this viewpoint hopelessly naive? David: Not naive as such, but both deterministic and limited, as it assumes that the introduction of technology can only have one ‘built-in’ outcome, thus ignoring human agency, and as it pretends that the Chinese Communist Party does not use technology at all. Given the intense involvement of Party and government offices,…