根据相关法律法规和政策，部分搜索结果未予显示 could be a warning message we will see displayed more often on the Internet; but likely translations thereof. In Chinese, this means “according to the relevant laws, regulations, and policies, a portion of search results have not been displayed.” The control of information flows on the Internet is becoming more commonplace, in authoritarian regimes as well as in liberal democracies, either via technical or regulatory means. Such information controls can be defined as “[…] actions conducted in or through information and communications technologies (ICTs), which seek to deny (such as web filtering), disrupt (such as denial-of-service attacks), shape (such as throttling), secure (such as through encryption or circumvention) or monitor (such as passive or targeted surveillance) information for political ends. Information controls can also be non-technical and can be implemented through legal and regulatory frameworks, including informal pressures placed on private companies. […]” Information controls are not intrinsically good or bad, but much is to be explored and analysed about their use, for political or commercial purposes.
The University of Toronto’s Citizen Lab organised a one-week summer institute titled “Monitoring Internet Openness and Rights” to inform the global discussions on information control research and practice in the fields of censorship, circumvention, surveillance and adherence to human rights. A week full of presentations and workshops on the intersection of technical tools, social science research, ethical and legal reflections and policy implications was attended by a distinguished group of about 60 community members, amongst whom were two OII DPhil students; Jon Penney and Ben Zevenbergen. Conducting Internet measurements may be considered to be a terra incognita in terms of methodology and data collection, but the relevance and impacts for Internet policy-making, geopolitics or network management are obvious and undisputed.
The Citizen Lab prides itself in being a “hacker hothouse”, or an “intelligence agency for civil society” where security expertise, politics, and ethics intersect. Their research adds the much-needed geopolitical angle to the deeply technical and quantitative Internet measurements they conduct on information networks worldwide. While the Internet is fast becoming the backbone of our modern societies in many positive and welcome ways, abundant (intentional) security vulnerabilities, the ease with which human rights such as privacy and freedom of speech can be violated, threats to the neutrality of the network and the extent of mass surveillance threaten to compromise the potential of our global information sphere. Threats to a free and open internet need to be uncovered and explained to policymakers, in order encourage informed, evidence-based policy decisions, especially in a time when the underlying technology is not well-understood by decision makers.
Participants at the summer institute came with the intent to make sense of Internet measurements and information controls, as well as their social, political and ethical impacts. Through discussions in larger and smaller groups throughout the Munk School of Global Affairs – as well as restaurants and bars around Toronto – the current state of the information controls, their regulation and deployment became clear, and multi-disciplinary projects to measure breaches of human rights on the Internet or its fundamental principles were devised and coordinated.
The outcomes of the week in Toronto are impressive. The OII DPhil students presented their recent work on transparency reporting and ethical data collection in Internet measurement.
Jon Penney gave a talk on “the United States experience” with Internet-related corporate transparency reporting, that is, the evolution of existing American corporate practices in publishing “transparency reports” about the nature and quantity of government and law enforcement requests for Internet user data or content removal. Jon first began working on transparency issues as a Google Policy Fellow with the Citizen Lab in 2011, and his work has continued during his time at Harvard’s Berkman Center for Internet and Society. In this talk, Jon argued that in the U.S., corporate transparency reporting largely began with the leadership of Google and a few other Silicon Valley tech companies like Twitter, but in the Post-Snowden era, has been adopted by a wider cross section of not only technology companies, but also established telecommunications companies like Verizon and AT&T previously resistant to greater transparency in this space (perhaps due to closer, longer term relationships with federal agencies than Silicon Valley companies). Jon also canvassed evolving legal and regulatory challenges facing U.S. transparency reporting and means by which companies may provide some measure of transparency— via tools like warrant canaries— in the face of increasingly complex national security laws.
Ben Zevenbergen has recently launched ethical guidelines for the protection of privacy with regards to Internet measurements conducted via mobile phones. The first panel of the week on “Network Measurement and Information Controls” called explicitly for more concrete ethical and legal guidelines for Internet measurement projects, because the extent of data collection necessarily entails that much personal data is collected and analyzed. In the second panel on “Mobile Security and Privacy”, Ben explained how his guidelines form a privacy impact assessment for a privacy-by-design approach to mobile network measurements. The iterative process of designing a research in close cooperation with colleagues, possibly from different disciplines, ensures that privacy is taken into account at all stages of the project development. His talk led to two connected and well-attended sessions during the week to discuss the ethics of information controls research and Internet measurements. A mailing list has been set up for engineers, programmers, activists, lawyers and ethicists to discuss the ethical and legal aspects of Internet measurements. A data collection has begun to create a taxonomy of ethical issues in the discipline to inform forthcoming peer-reviewed papers.
The Citizen Lab will host its final summer institute of the series in 2015.
Photo credits: Ben Zevenbergen, Jon Penney. Writing Credits: Ben Zevenbergen, with small contribution from Jon Penney.
Ben Zevenbergen is an OII DPhil student and Research Assistant working on the EU Internet Science project. He has worked on legal, political and policy aspects of the information society for several years. Most recently he was a policy advisor to an MEP in the European Parliament, working on Europe’s Digital Agenda.
Jon Penney is a legal academic, doctoral student at the Oxford Internet Institute, and a Research Fellow / Affiliate of both The Citizen Lab an interdisciplinary research lab specializing in digital media, cyber-security, and human rights, at the University of Toronto’s Munk School for Global Affairs, and at the Berkman Center for Internet & Society, Harvard University.
In the journal’s inaugural issue, founding Editor-in-Chief Helen Margetts outlined what are essentially two central premises behind Policy & Internet’s launch. The first is that “we cannot understand, analyze or make public policy without understanding the technological, social and economic shifts associated with the Internet” (Margetts 2009, 1). It is simply not possible to consider public policy today without some regard for the intertwining of information technologies with everyday life and society. The second premise is that the rise of the Internet is associated with shifts in how policy itself is made. In particular, she proposed that impacts of Internet adoption would be felt in the tools through which policies are effected, and the values that policy processes embody.
The purpose of the Policy and Internet journal was to take up these two challenges: the public policy implications of Internet-related social change, and Internet-related changes in policy processes themselves. In recognition of the inherently multi-disciplinary nature of policy research, the journal is designed to act as a meeting place for all kinds of disciplinary and methodological approaches. Helen predicted that methodological approaches based on large-scale transactional data, network analysis, and experimentation would turn out to be particularly important for policy and Internet studies. Driving the advancement of these methods was therefore the journal’s third purpose. Today, the journal has reached a significant milestone: over one hundred high-quality peer-reviewed articles published. This seems an opportune moment to take stock of what kind of research we have published in practice, and see how it stacks up against the original vision.
At the most general level, the journal’s articles fall into three broad categories: the Internet and public policy (48 articles), the Internet and policy processes (51 articles), and discussion of novel methodologies (10 articles). The first of these categories, “the Internet and public policy,” can be further broken down into a number of subcategories. One of the most prominent of these streams is fundamental rights in a mediated society (11 articles), which focuses particularly on privacy and freedom of expression. Related streams are children and child protection (six articles), copyright and piracy (five articles), and general e-commerce regulation (six articles), including taxation. A recently emerged stream in the journal is hate speech and cybersecurity (four articles). Of course, an enduring research stream is Internet governance, or the regulation of technical infrastructures and economic institutions that constitute the material basis of the Internet (seven articles). In recent years, the research agenda in this stream has been influenced by national policy debates around broadband market competition and network neutrality (Hahn and Singer 2013). Another enduring stream deals with the Internet and public health (eight articles).
Looking specifically at “the Internet and policy processes” category, the largest stream is e-participation, or the role of the Internet in engaging citizens in national and local government policy processes, through methods such as online deliberation, petition platforms, and voting advice applications (18 articles). Two other streams are e-government, or the use of Internet technologies for government service provision (seven articles), and e-politics, or the use of the Internet in mainstream politics, such as election campaigning and communications of the political elite (nine articles). Another stream that has gained pace during recent years, is online collective action, or the role of the Internet in activism, ‘clicktivism,’ and protest campaigns (16 articles). Last year the journal published a special issue on online collective action (Calderaro and Kavada 2013), and the next forthcoming issue includes an invited article on digital civics by Ethan Zuckerman, director of MIT’s Center for Civic Media, with commentary from prominent scholars of Internet activism. A trajectory discernible in this stream over the years is a movement from discussing mere potentials towards analyzing real impacts—including critical analyses of the sometimes inflated expectations and “democracy bubbles” created by digital media (Shulman 2009; Karpf 2012; Bryer 2012).
The final category, discussion of novel methodologies, consists of articles that develop, analyze, and reflect critically on methodological innovations in policy and Internet studies. Empirical articles published in the journal have made use of a wide range of conventional and novel research methods, from interviews and surveys to automated content analysis and advanced network analysis methods. But of those articles where methodology is the topic rather than merely the tool, the majority deal with so-called “big data,” or the use of large-scale transactional data sources in research, commerce, and evidence-based public policy (nine articles). The journal recently devoted a special issue to the potentials and pitfalls of big data for public policy (Margetts and Sutcliffe 2013), based on selected contributions to the journal’s 2012 big data conference: Big Data, Big Challenges? In general, the notion of data science and public policy is a growing research theme.
This brief analysis suggests that research published in the journal over the last five years has indeed followed the broad contours of the original vision. The two challenges, namely policy implications of Internet-related social change and Internet-related changes in policy processes, have both been addressed. In particular, research has addressed the implications of the Internet’s increasing role in social and political life. The journal has also furthered the development of new methodologies, especially the use of online network analysis techniques and large-scale transactional data sources (aka ‘big data’).
As expected, authors from a wide range of disciplines have contributed their perspectives to the journal, and engaged with other disciplines, while retaining the rigor of their own specialisms. The geographic scope of the contributions has been truly global, with authors and research contexts from six continents. I am also pleased to note that a characteristic common to all the published articles is polish; this is no doubt in part due to the high level of editorial support that the journal is able to afford to authors, including copyediting. The justifications for the journal’s establishment five years ago have clearly been borne out, so that the journal now performs an important function in fostering and bringing together research on the public policy implications of an increasingly Internet-mediated society.
And what of my own research interests as an editor? In the inaugural editorial, Helen Margetts highlighted work, finance, exchange, and economic themes in general as being among the prominent areas of Internet-related social change that are likely to have significant future policy implications. I think for the most part, these implications remain to be addressed, and this is an area that the journal can encourage authors to tackle better. As an editor, I will work to direct attention to this opportunity, and welcome manuscript submissions on all aspects of Internet-enabled economic change and its policy implications. This work will be kickstarted by the journal’s 2014 conference (26-27 September), which this year focuses on crowdsourcing and online labor.
Our published articles will continue to be highlighted here in the journal’s blog. Launched last year, we believe this blog will help to expand the reach and impact of research published in Policy and Internet to the wider academic and practitioner communities, promote discussion, and increase authors’ citations. After all, publication is only the start of an article’s public life: we want people reading, debating, citing, and offering responses to the research that we, and our excellent reviewers, feel is important, and worth publishing.
Ed: How easy is it to request or scrape data from the “Chinese Web”? And how much of it is under some form of government control?
Han-Teng: Access to data from the Chinese Web, like other Web data, depends on the policies of platforms, the level of data openness, and the availability of data intermediary and tools. All these factors have direct impacts on the quality and usability of data. Since there are many forms of government control and intentions, increasingly not just the websites inside mainland China under Chinese jurisdiction, but also the Chinese “soft power” institutions and individuals telling the “Chinese story” or “Chinese dream” (as opposed to “American dreams”), it requires case-by-case research to determine the extent and level of government control and interventions. Based on my own research on Chinese user-generated encyclopaedias and Chinese-language twitter and Weibo, the research expectations seem to be that control and intervention by Beijing will be most likely on political and cultural topics, not likely on economic or entertainment ones.
This observation is linked to how various forms of government control and interventions are executed, which often requires massive data and human operations to filter, categorise and produce content that are often based on keywords. It is particularly true for Chinese websites in mainland China (behind the Great Firewall, excluding Hong Kong and Macao), where private website companies execute these day-to-day operations under the directives and memos of various Chinese party and government agencies.
Of course there is some extra layer of challenges if researchers try to request content and traffic data from the major Chinese websites for research, especially regarding censorship. Nonetheless, since most Web content data is open, researchers such as Professor Fu in Hong Kong University manage to scrape data sample from Weibo, helping researchers like me to access the data more easily. These openly collected data can then be used to measure potential government control, as has been done for previous research on search engines (Jiang and Akhtar 2011; Zhu et al. 2011) and social media (Bamman et al. 2012; Fu et al. 2013; Fu and Chau 2013; King et al. 2012; Zhu et al. 2012).
It follows that the availability of data intermediary and tools will become important for both academic and corporate research. Many new “public opinion monitoring” companies compete to provide better tools and datasets as data intermediaries, including the Online Public Opinion Monitoring and Measuring Unit (人民网舆情监测室) of the People’s Net (a Party press organ) with annual revenue near 200 million RMB. Hence, in addition to the on-going considerations on big data and Web data research, we need to factor in how these private and public Web data intermediaries shape the Chinese Web data environment (Liao et al. 2013).
Given the fact that the government’s control of information on the Chinese Web involves not only the marginalization (as opposed to the traditional censorship) of “unwanted” messages and information, but also the prioritisation of propaganda or pro-government messages (including those made by paid commentators and “robots”), I would add that the new challenges for researchers include the detection of paid (and sometimes robot-generated) comments. Although these challenges are not exactly the same as data access, researchers need to consider them for data collection.
Ed: How much of the content and traffic is identifiable or geolocatable by region (eg mainland vs Hong Kong, Taiwan, abroad)?
Han-Teng: Identifying geographic information from Chinese Web data, like other Web data, can be largely done by geo-IP (a straightforward IP to geographic location mapping service), domain names (.cn for China; .hk for Hong Kong; .tw for Taiwan), and language preferences (simplified Chinese used by mainland Chinese users; traditional Chinese used by Hong Kong and Taiwan). Again, like the question of data access, the availability and quality of such geographic and linguistic information depends on the policies, openness, and the availability of data intermediary and tools.
Nonetheless, there exist research efforts on using geographic and/or linguistic information of Chinese Web data to assess the level and extent of convergence and separation of Chinese information and users around the world (Etling et al. 2009; Liao 2008; Taneja and Wu 2013). Etling and colleagues (2009) concluded their mapping of Chinese blogsphere research with the interpretation of five “attentive spaces” roughly corresponding to five clusters or zones in the network map: on one side, two clusters of “Pro-state” and “Business” bloggers, and on the other, two clusters of “Overseas” bloggers (including Hong Kong and Taiwan) and “Culture”. Situated between the three clusters of “Pro-state”, “Overseas” and “Culture” (and thus at the centre of the network map) is the remaining cluster they call the “critical discourse” cluster, which is at the intersection of the two sides (albeit more on the “blocked” side of the Great Firewall).
I myself found distinct geographic focus and linguistic preferences between the online citations in Baidu Baike and Chinese Wikipedia (Liao 2008). Other research based on a sample of traffic data shows the existence of a “Chinese” cluster as an instance of a “culturally defined market”, regardless of their geographic and linguistic differences (Taneja and Wu 2013). Although I found their argument that the Great Firewall has very limited impacts on such a single “Chinese” cluster, they demonstrate the possibility of extracting geographic and linguistic information on Chinese Web data for better understanding the dynamics of Chinese online interactions; which are by no means limited within China or behind the Great Firewall.
Ed: In terms of online monitoring of public opinion, is it possible to identify robots / “50 cent party” — that is, what proportion of the “opinion” actually has a government source?
Han-Teng: There exist research efforts in identifying robot comments by analysing the patterns and content of comments, and their profile relationship with other accounts. It is more difficult to prove the direct footprint of government sources. Nonetheless, if researchers take another approach such as narrative analysis for well-defined propaganda research (such as the pro- and anti-Falun opinions), it might be easier to categorise and visualise the dynamics and then trace back to the origins of dominant keywords and narratives to identify the sources of loud messages. I personally think such research and analytical efforts require deep knowledge on both technical and cultural-political understanding of Chinese Web data, preferably with an integrated mixed method research design that incorporates both the quantitative and qualitative methods required for the data question at hand.
Ed: In terms of censorship, ISPs operate within explicit governmental guidelines; do the public (who contribute content) also have explicit rules about what topics and content are ‘acceptable’, or do they have to work it out by seeing what gets deleted?
Han-Teng: As a general rule, online censorship works better when individual contributors are isolated. Most of the time, contributors experience technical difficulties when using Beijing’s unwanted keywords or undesired websites, triggering self-censorship behaviours to avoid such difficulties. I personally believe such tacit learning serves as the most relevant psychological and behaviour mechanism (rather than explicit rules). In a sense, the power of censorship and political discipline is the fact that the real rules of engagement are never explicit to users, thereby giving more power to technocrats to exercise power in a more arbitrary fashion. I would describe the general situation as follows. Directives are given to both ISPs and ICPs about certain “hot terms”, some dynamic and some constant. Users “learn” them through encountering various forms of “technical difficulties”. Thus, while ISPs and ICPs may not enforce the same directives in the same fashion (some overshoot while others undershoot), the general tacit knowledge about the “red line” is thus delivered.
Nevertheless, there are some efforts where users do share their experiences with one another, so that they have a social understanding of what information and which category of users is being disciplined. There are also constant efforts outside mainland China, especially institutions in Hong Kong and Berkeley to monitor what is being deleted. However, given the fact that data is abundant for Chinese users, I have become more worried about the phenomenon of “marginalization of information and/or narratives”. It should be noted that censorship or deletion is just one of the tools of propaganda technocrats and that the Chinese Communist Party has had its share of historical lessons (and also victories) against its past opponents, such as the Chinese Nationalist Party and the United States during the Chinese Civil War and the Cold War. I strongly believe that as researchers we need better concepts and tools to assess the dynamics of information marginalization and prioritisation, treating censorship and data deletion as one mechanism of information marginalization in the age of data abundance and limited attention.
Ed: Has anyone tried to produce a map of censorship: ie mapping absence of discussion? For a researcher wanting to do this, how would they get hold of the deleted content?
Han-Teng: Mapping censorship has been done through experiment (MacKinnon 2008; Zhu et al. 2011) and by contrasting datasets (Fu et al. 2013; Liao 2013; Zhu et al. 2012). Here the availability of data intermediaries such as the WeiboScope in Hong Kong University, and unblocked alternative such as Chinese Wikipedia, serve as direct and indirect points of comparison to see what is being or most likely to be deleted. As I am more interested in mapping information marginalization (as opposed to prioritisation), I would say that we need more analytical and visualisation tools to map out the different levels and extent of information censorship and marginalization. The research challenges then shift to the questions of how and why certain content has been deleted inside mainland China, and thus kept or leaked outside China. As we begin to realise that the censorship regime can still achieve its desired political effects by voicing down the undesired messages and voicing up the desired ones, researchers do not necessarily have to get hold of the deleted content from the websites inside mainland China. They can simply reuse plenty of Chinese Web data available outside the censorship and filtering regime to undertake experiments or comparative study.
Ed: What other questions are people trying to explore or answer with data from the “Chinese Web”? And what are the difficulties? For instance, are there enough tools available for academics wanting to process Chinese text?
Han-Teng: As Chinese societies (including mainland China, Hong Kong, Taiwan and other overseas diaspora communities) go digital and networked, it’s only a matter of time before Chinese Web data becomes the equivalent of English Web data. However, there are challenges in processing Chinese language texts, although several of the major challenges become manageable as digital and network tools go multilingual. In fact, Chinese-language users and technologies have been the major goal and actors for a multi-lingual Internet (Liao 2009a,b). While there is technical progress in basic tools, we as Chinese Internet researchers still lack data and tool intermediaries that are designed to process Chinese texts smoothly. For instance, many analytical software and tools depend on or require the use of space characters as word boundaries, a condition that does not apply to Chinese texts.
In addition, since there exist some technical and interpretative challenges in analysing Chinese text datasets with mixed scripts (e.g. simplified and traditional Chinese) or with other foreign languages. Mandarin Chinese language is not the only language inside China; there are indications that the Cantonese and Shanghainese languages have a significant presence. Minority languages such as Tibetan, Mongolian, Uyghur, etc. are also still used by official Chinese websites to demonstrate the cultural inclusiveness of the Chinese authorities. Chinese official and semi-official diplomatic organs have also tried to tell “Chinese stories” in various of the world’s major languages, sometimes in direct competition with its political opponents such as Falun Gong.
These areas of the “Chinese Web” data remain unexplored territory for systematic research, which will require more tools and methods that are similar to the toolkits of multi-lingual Internet researchers. Hence I would say the basic data and tool challenges are not particular to the “Chinese Web”, but are rather a general challenge to the “Web” that is becoming increasingly multilingual by the day. We Chinese Internet researchers do need more collaboration when it comes to sharing data and tools, and I am hopeful that we will have more trustworthy and independent data intermediaries, such as Weiboscope and others, for a better future of the Chinese Web data ecology.
Etling, B., Kelly, J., & Faris, R. (2009). Mapping Chinese Blogosphere. In 7th Annual Chinese Internet Research Conference (CIRC 2009). Annenberg School for Communication, University of Pennsylvania, Philadelphia, US.
Fu, K., Chan, C., & Chau, M. (2013). Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and Impact Evaluation of the “Real Name Registration” Policy. IEEE Internet Computing, 17(3), 42–50.
Fu, K., & Chau, M. (2013). Reality Check for the Chinese Microblog Space: a random sampling approach. PLOS ONE, 8(3), e58356.
Jiang, M., & Akhtar, A. (2011). Peer into the Black Box of Chinese Search Engines: A Comparative Study of Baidu, Google, and Goso. Presented at the The 9th Chinese Internet Research Conference (CIRC 2011), Washington, D.C.: Institute for the Study of Diplomacy. Georgetown University.
Liao, H.-T. (2008). A webometric comparison of Chinese Wikipedia and Baidu Baike and its implications for understanding the Chinese-speaking Internet. In 9th annual Internet Research Conference: Rethinking Community, Rethinking Place. Copenhagen.
Liao, H.-T. (2009a). Are Chinese characters not modern enough? An essay on their role online. GLIMPSE: the art + science of seeing, 2(1), 16–24.
Liao, H.-T. (2009b). Conflict and Consensus in the Chinese version of Wikipedia. IEEE Technology and Society Magazine, 28(2), 49–56. doi:10.1109/MTS.2009.932799
Liao, H.-T. (2013, August 5). How do Baidu Baike and Chinese Wikipedia filter contribution? A case study of network gatekeeping. To be presented at the Wikisym 2013: The Joint International Symposium on Open Collaboration, Hong Kong.
Liao, H.-T., Fu, K., Jiang, M., & Wang, N. (2013, June 15). Chinese Web Data: Definition, Uses, and Scholarship. (Accepted). To be presented at the 11th Annual Chinese Internet Research Conference (CIRC 2013), Oxford, UK.
MacKinnon, R. (2008). Flatter world and thicker walls? Blogs, censorship and civic discourse in China. Public Choice, 134(1), 31–46. doi:10.1007/s11127-007-9199-0
Han-Teng was talking to blog editor David Sutcliffe.
Han-Teng Liao is an OII DPhil student whose research aims to reconsider the role of keywords (as in understanding “keyword advertising” using knowledge from sociolinguistics and information science) and hyperlinks (webometrics) in shaping the sense of “fellow users” in digital networked environments. Specifically, his DPhil project is a comparative study of two major user-contributed Chinese encyclopedias, Chinese Wikipedia and Baidu Baike.
Ed: How much work has been done on censorship of online news in China? What are the methodological challenges and important questions associated with this line of enquiry?
Sonya: Recent research is paying much attention to social media and aiming to quantify their censorial practices and to discern common patterns in them. Among these empirical studies, Bamman et al.’s (2012) work claimed to be “the first large-scale analysis of political content censorship” that investigates messages deleted from Sina Weibo, a Chinese equivalent to Twitter. On an even larger scale, King et al. (2013) collected data from nearly 1,400 Chinese social media platforms and analyzed the deleted messages. Most studies on news censorship, however, are devoted to narratives of special cases, such as the closure of Freeing Point, an outspoken news and opinion journal, and the blocking of the New York Times after it disclosed the wealth possessed by the family of Chinese former premier Wen Jiabao.
The shortage of news censorship research could be attributed to several methodological challenges. First, it is tricky to detect censorship to begin with, given the word ‘censorship’ is one of the first to be censored. Also, news websites will not simply let their readers hit a glaring “404 page not found”. Instead, they will use a “soft 404”, which returns a “success” code for a request of a deleted web page and takes readers to a (different) existing web page. While humans may be able to detect these soft 404s, it will be harder for computer programs (eg run by researchers) to do so. Moreover, because different websites employ varying soft 404 techniques, much labor is required to survey them and to incorporate the acquired knowledge into a generic monitoring tool.
Second, high computing power and bandwidth are required to handle the large amount of news publications and the slow network access to Chinese websites. For instance, NetEase alone publishes 8,000 – 10,000 news articles every day. Meanwhile, the Internet connection between the Chinese cyberspace and the outer world is fairly slow and it takes more than a second to check one link because the Great Firewall checks both incoming and outgoing Internet traffic. These two factors translate to 2-3 hours for a single program to check one day’s news publications of NetEase alone. If we fire up too many programs to accelerate the progress, the database system and/or the network connection may be challenged. In my case, even though I am using high performance computers at Michigan State University to conduct this research, they are overwhelmed every now and then.
Despite all the difficulties, I believe it is of great importance to reveal censored news stories to the public, especially to the audience inside China who do not enjoy a free flow of information. Censored news is a special type of information, as it is too inconvenient to exist in authorities’ eyes and it is deemed important to citizens’ everyday lives. For example, the outbreak of SARS had been censored from Chinese media presumably to avoid spoiling the harmonious atmosphere created for the 16th National Congress of the Communist Party. This allowed the virus to develop into a worldwide epidemic. Like SARS, a variety of censored issues are not only inconvenient but also crucial, because the authorities would not otherwise allocate substantial resources to monitor or eliminate them if they were merely trivial. Therefore, after censored news is detected, it is vital to seek effective and efficient channels to disclose it to the public so as to counterbalance potential damage that censorship may entail.
Ed: You found that party organs, ie news organizations tightly affiliated with the Chinese Communist Party, published a considerable amount of deleted news. Was this surprising?
Sonya: Yes, I was surprised when looking at the results the first time. To be exact, our finding is that commercial media experience a higher deletion rate, but party organs contribute the most deleted news by sheer volume, reflecting the fact that party organs possess more resources allocated by the central and local governments and therefore have the capacity to produce more news. Consequently, party organs have a higher chance of publishing controversial information that may be deleted in the future, especially when a news story becomes sensitive for some reason that is hard to foresee. For example, investigations of some government officials started when netizens recognized them in the news with different luxury watches and other expensive accessories. As such, even though party organs are obliged to write odes to the party, they may eventually backfire on the cadres if the beautiful words are discovered to be too far from reality.
Ed: How sensitive are citizens to the fact that some topics are actively avoided in the news media? And how easy is it for people to keep abreast of these topics (eg the “three Ts” of Tibet, Taiwan, and Tiananmen) from other information sources?
Sonya: This question highlights the distinction between pre-censorship and post-censorship. Our study looked at post-censorship, ie information that is published but subsequently deleted. By contrast, the topics that are “actively avoided” fall under the category of pre-censorship. I am fairly convinced that the current pre- and post-censorship practice is effective in terms of keeping the public from learning inconvenient facts and from mobilizing for collective action. If certain topics are consistently wiped from the mass media, how will citizens ever get to know about them?
The Tiananmen Square protest, for instance, has never been covered by Chinese mass media, leaving an entire generation growing up since 1989 that is ignorant of this historical event. As such, if younger Chinese citizens have never heard of the Tiananmen Square protest, how could they possibly start an inquiry into this incident? Or, if they have heard of it and attempt to learn about it from the Internet, what they will soon realize is that domestic search engines, social media, and news media all fail their requests and foreign ones are blocked. Certainly, they could use circumvention tools to bypass the Great Firewall, but the sad truth is that probably under 1% of them have ever made such an effort, according to the Harvard Berkman Center’s report in 2011.
Ed: Is censorship of domestic news (such as food scares) more geared towards “avoiding panics and maintaining social order”, or just avoiding political embarrassment? For example, do you see censorship of environmental issues and (avoidable) disasters?
Sonya: The government certainly tries to avoid political embarrassment in the case of food scares by manipulating news coverage, but it is also their priority to maintain social order or so-called “social harmony”. Exactly for this reason, Zhao Lianhai, the most outspoken parent of a toxic milk powder victim was charged with “inciting social disorder” and sentenced to two and a half years in prison. Frustrated by Chinese milk powder, Chinese tourists are aggressively stocking up on milk powder from elsewhere, such as in Hong Kong and New Zealand, causing panics over milk powder shortages in those places.
After the earthquake in Sichuan, another group of grieving parents were arrested on similar charges when they questioned why their children were buried under crumbled schools whereas older buildings remained standing. The high death toll of this earthquake was among the avoidable disasters that the government attempts to mask and force the public to forget. Environmental issues, along with land acquisition, social unrest, and labor exploitation, are other frequently censored topics in the name of “stability maintenance”.
Ed: You plotted a map to show the geographic distribution of news deletion: what does the pattern show?
Sonya: We see an apparent geographic pattern in news deletion, with neighboring countries being more likely to be deleted than distant ones. Border disputes between China and its neighbors may be one cause; for example with Japan over the Diaoyu-Senkaku Islands, with the Philippines over the Huangyan Island-Scarborough Shoal, and with India over South Tibet. Another reason may be a concern over maintaining allies. Burma had the highest deletion rates among all the countries, with the deleted news mostly covering its curb on censorship. Watching this shift, China might worry that media reform in Burma could lead to copycat attempts inside China.
On the other hand, China has given Burma diplomatic cover, considering it as the “second coast” to the Indian Ocean and importing its natural resources (Howe & Knight, 2012). For these reasons, China may be compelled to censor Burma more than other countries, even though they don’t share a border. Nonetheless, although oceans apart, the US topped the list by sheer number of news deletions, reflecting the bittersweet relation between the two nations.
Ed: What do you think explains the much higher levels of censorship reported by others for social media than for news media? How does geographic distribution of deletion differ between the two?
Sonya: The deletion rates of online news are apparently far lower than those of Sina Weibo posts. The overall deletion rates on NetEase and Sina Beijing were 0.05% and 0.17%, compared to 16.25% on the social media platform (Bamman et al., 2012). Several reasons may help explain this gap. First, social media confronts enduring spam that has to be cleaned up constantly, whereas it is not a problem at all for professional news aggregators. Second, self-censorship practiced by news media plays an important role, because Chinese journalists are more obliged and prepared to self-censor sensitive information, compared to ordinary Chinese citizens. Subsequently, news media rarely mention “crown prince party” or “democracy movement”, which were among the most frequently deleted terms on Sina Weibo.
Geographically, the deletion rates across China have distinct patterns on news media and social media. Regarding Sina Weibo, deletion rates increase when the messages are published near the fringe or in the west where the economy is less developed. Regarding news websites, the deletion rates rise as they approach the center and east, where the economy is better developed. In addition, the provinces surrounding Beijing also have more news deleted, meaning that political concerns are a driving force behind content control.
Ed: Can you tell if the censorship process mostly relies on searching for sensitive keywords, or on more semantic analysis of the actual content? ie can you (or the censors..) distinguish sensitive “opinions” as well as sensitive topics?
Sonya: First, too sensitive topics will never survive pre-censorship or be published on news websites, such as the Tiananmen Square protest, although they may sneak in on social media with deliberate typos or other circumvention techniques. However, it is clear that censors use keywords to locate articles on sensitive topics. For instance, after the Fukushima earthquake in 2011, rumors spread in the Chinese Cyberspace that radiation was rising from the Japanese nuclear plant and iodine would help protect against its harmful effects; this was followed by panic-buying of iodized salt. During this period, “nuclear defense”, “iodized salt” and “radioactive iodine”–among other generally neutral terms–became politically charged overnight, and were censored in the Chinese web sphere. The taboo list of post-censorship keywords evolves continuously to handle breaking news. Beyond keywords, party organs and other online media are trying to automate sentiment analysis and discern more subtle context. People’s Daily, for instance, has been working with elite Chinese universities in this field and already developed a generic product for other institutes to monitor “public sentiment”.
Another way to sort out sensitive information is to keep an eye on most popular stories, because a popular story would represent a greater “threat” to the existing political and social order. In our study, about 47% of the deleted stories were listed as top 100 mostly read/discussed at some point. This indicates that the more readership a story gains, the more attention it draws from censors.
Although news websites self-censor (therefore experiencing under 1% post-censorship), they are also required to monitor and “clean” comments following each news article. According to my very conservative estimate–if a censor processes 100 comments per minute and works eight hours per day–reviewing comments on Sina Beijing from 11-16 September 2012, would have required 336 censors working full time. In fact, Charles Cao, CEO of Sina, mentioned to Forbes that at least 100 censors were “devoted to monitoring content 24 hours a day”. As new sensitive issues emerge and new circumvention techniques are developed continuously, it is an ongoing battle between the collective intelligence of Chinese netizens and the mechanical work conducted (and artificial intelligence implemented) by a small group of censors.
Ed: It must be a cause of considerable anxiety for journalists and editors to have their material removed. Does censorship lead to sanctions? Or is the censorship more of an annoyance that must be negotiated?
Sonya: Censorship does indeed lead to sanctions. However, I don’t think “anxiety” would be the right word to describe their feelings, because if they are really anxious they could always choose self-censorship and avoid embarrassing the authorities. Considering it is fairly easy to predict whether a news report will please or irritate officials, I believe what fulfills the whistleblowers when they disclose inconvenient facts is a strong sense of justice and tremendous audacity. Moreover, I could barely discern any “negotiation” in the process of censorship. Negotiation is at least a two-way communication, whereas censorship follows continual orders sent from the authorities to the mass media, and similarly propaganda is a one-way communication from the authorities to the masses via the media. As such, it is common to see disobedient journalists threatened or punished for “defying” censorial orders.
Southern Metropolis Daily is one of China’s most aggressive and punished newspapers. In 2003, the newspaper broke the epidemic of SARS that local officials had wished to hide from the public. Soon after this report, it covered a university graduate beaten to death in policy custody because he carried no proper residency papers. Both cases received enormous attention from Chinese authorities and the international community, seriously embarrassing local officials. It is alleged and widely believed that some local officials demanded harsh penalties for the Daily; the director and the deputy editor were sentenced to 11 and 12 years in jail for “taking briberies” and “misappropriating state-owned assets” and the chief editor was dismissed.
Not only professional journalists but also (broadly defined) citizen journalists could face similar penalties. For instance, Xu Zhiyong, a lawyer who defended journalists on trial, and Ai Weiwei, an artist who tried to investigate collapsed schools after the Sichuan earthquake, have experienced similar penalties: fines for tax evasion, physical attacks, house arrest, and secret detainment; exactly the same censorship tactics that states carried out before the advent of the Internet, as described in Ilan Peleg’s (1993) book Patterns of Censorship Around the World.
Ed: What do you think explains the lack of censorship in the overseas portal? (Could there be a certain value for the government in having some news items accessible to an external audience, but unavailable to the internal one?)
Sonya: It is more costly to control content by searching for and deleting individual news stories than simply blocking a whole website. For this reason, when a website outside the Great Firewall carries embarrassing content to the Chinese government, Chinese censors will simply block the whole website rather than request deletions. Overseas branches of Chinese media may comply but foreign media may simply drop such a deletion request.
Given online users’ behavior, it is effective and efficient to strictly control domestic content. In general, there are two types of Chinese online users, those who only visit Chinese websites operating inside China and those who also consume content from outside the country. Regarding this second type, it is really hard to prescribe what they do and don’t read, because they may be well equipped with circumvention tools and often obtain access to Chinese media published in Hong Kong and Taiwan but blocked in China. In addition, some Western media, such as the BBC, the New York Times, and Deutsche Welle, make media consumption easy for Chinese readers by publishing in Chinese. Of course, this type of Chinese user may be well educated and able to read English and other foreign languages directly. Facing these people, Chinese authorities would see their efforts in vain if they tried to censor overseas branches of Chinese media, because, outside the Great Firewall, there are too many sources for information that lie beyond the reach of Chinese censors.
Chinese authorities are in fact strategically wise in putting their efforts into controlling domestic online media, because this first type of Chinese user accounts for 99.9% of the whole online population, according to Google’s 2010 estimate. In his 2013 book Rewire, Ethan Zuckerman summarizes this phenomenon: “none of the top ten nations [in terms of online population] looks at more than 7 percent international content in its fifty most popular news sites” (p. 56). Since the majority of the Chinese populace perceives the domestic Internet as “the entire cyberspace”, manipulating the content published inside the Great Firewall means that (according to Chinese censors) many of the time bombs will have been defused.
Sonya Y. Song led this study as a Google Policy Fellow in 2012. Currently, she is a Knight-Mozilla OpenNews Fellow and a Ph.D. candidate in media and information studies at Michigan State University. Sonya holds a bachelor’s and master’s degree in computer science from Tsinghua University in Beijing and master of philosophy in journalism from the University of Hong Kong. She is also an avid photographer, a devotee of literature, and a film buff.
Sonya Yan Song was talking to blog editor David Sutcliffe.
David:For our research, we surveyed postgraduate students from all over China who had come to Shanghai to study. We asked them five questions to which they provided mostly rather lengthy answers. Despite them being young university students and being very active online, their answers managed to surprise us. Notably, the young Chinese who took part in our research felt very ambiguous about the Internet and its supposed benefits for individual people in China. They appreciated the greater freedom the Internet offered when compared to offline China, but were very wary of others abusing this freedom to their detriment.
Ed: In your paper you note that the opinions of many young people closely mirrored those of the government’s statements about the Internet — in what way?
David: In 2010 the government published a White Paper on the Internet in China in which they argued that the main uses of the Internet were for obtaining information, and for communicating with others. In contrast to Euro-American discourses around the Internet as a ‘force for democracy’, the students’ answers to our questions agreed with the evaluation of the government and did not see the Internet as a place to begin organising politically. The main reason for this — in my opinion — is that young Chinese are not used to discussing ‘politics’, and are mostly focused on pursuing the ‘Chinese dream’: good job, large flat or house, nice car, suitable spouse; usually in that order.
Ed: The Chinese Internet has usually been discussed in the West as a ‘force for democracy’ — leading to the inevitable relinquishing of control by the Chinese Communist Party. Is this viewpoint hopelessly naive?
David: Not naive as such, but both deterministic and limited, as it assumes that the introduction of technology can only have one ‘built-in’ outcome, thus ignoring human agency, and as it pretends that the Chinese Communist Party does not use technology at all. Given the intense involvement of Party and government offices, as well as of individual party members and government officials with the Internet it makes little sense to talk about ‘the Party’ and ‘the Internet’ as unconnected entities. Compared to governments in Europe or America, the Chinese Communist Party and the Chinese government have embraced the Internet and treated it as a real and valid communication channel between citizens and government/Party at all levels.
Ed: Chinese citizens are being encouraged by the government to engage and complain online, eg to expose inefficiency and corruption. Is the Internet just a space to blow off steam, or is it really capable of ‘changing’ Chinese society, as many have assumed?
David: This is mostly a matter of perspective and expectations. The Internet has NOT changed the system in China, nor is it likely to. In all likelihood, the Internet is bolstering the legitimacy and the control of the Chinese Communist Party over China. However, in many specific instances of citizen unhappiness and unrest, the Internet has proved a powerful channel of communication for the people to achieve their goals, as the authorities have reacted to online protests and supported the demands of citizens. This is a genuine change and empowerment of the people, though episodic and local, not global.
Ed: Why do you think your respondents were so accepting (and welcoming) of government control of the Internet in China: is this mainly due to government efforts to manage online opinion, or something else?
David: I think this is a reflex response fairly similar to what has happened elsewhere as well. If e.g. children manage to access porn sites, or an adult manages to groom several children over the Internet the mass media and the parents of the children call for ‘government’ to protect the children. This abrogation of power and shifting of responsibility to ‘the government’ by individuals — in the example by parents, in our study by young Chinese — is fairly widespread, if deplorable. Ultimately this demand for government ‘protection’ leads to what I would consider excessive government surveillance and control (and regulation) of online spaces in the name of ‘protection’ and the public’s acquiescence of the policing of cyberspace. In China, this takes the form of a widespread (resigned) acceptance of government censorship; in the UK it led to the acceptance of GCHQ’s involvement in Prism, or of the sentencing of Deyka Ayan Hassan or of Liam Stacey, which have turned the UK into the only country in the world in which people have been arrested for posting single, offensive posts on microblogs.
Ed: How does the central Government manage and control opinion online?
David: There is no unified system of government control over the Internet in China. Instead, there are many groups and institutions at all levels from central to local with overlapping areas of responsibility in China who are all exerting an influence on Chinese cyberspaces. There are direct posts by government or Party officials, posts by ‘famous’ people in support of government decisions or policies, paid, ‘hidden’ posters or even people sympathetic to the government. China’s notorious online celebrity Han Han once pointed out that the term ‘the Communist Party’ really means a population group of over 300 million people connected to someone who is an actual Party member.
In addition to pro-government postings, there are many different forms of censorship that try to prevent unacceptable posts. The exact definition of ‘unacceptable’ changes from time to time and even from location to location, though. In Beijing, around October 1, the Chinese National Day, many more websites are inaccessible than, for example in Shenzhen during April. Different government or Party groups also add different terms to the list of ‘unacceptable’ topics (or remove them), which contributes to the flexibility of the censorship system.
As a result of the often unpredictable ‘current’ limits of censorship, many Internet companies, forum and site managers, as well as individual Internet users add their own ‘self-censorship’ to the mix to ensure their own uninterrupted presence online. This ‘self-censorship’ is often stricter than existing government or Party regulations, so as not to even test the limits of the possible.
Ed: Despite the constant encouragement / admonishment of the government that citizens should report and discuss their problems online; do you think this is a clever (ie safe) thing for citizens to do? Are people pretty clever about negotiating their way online?
David: If it looks like a duck, moves like a duck, talks like a duck … is it a duck? There has been a lot of evidence over the years (and many academic articles) that demonstrate the government’s willingness to listen to criticism online without punishing the posters. People do get punished if they stray into ‘definitely illegal’ territory, e.g. promoting independence for parts of China, or questioning the right of the Communist Party to govern China, but so far people have been free to express their criticism of specific government actions online, and have received support from the authorities for their complaints.
Just to note briefly; one underlying issue here is the definition of ‘politics’ and ‘power’. Following Foucault, in Europe and America ‘everything’ is political, and ‘everything’ is a question of power. In China, there is a difference between ‘political’ issues, which are the responsibility of the Communist Party, and ‘social’ issues, which can be discussed (and complained about) by anybody. It might be worth exploring this difference of definitions without a priori acceptance of the Foucauldian position as ‘correct’.
Ed: There’s a lot of emphasis on using eg social media to expose corrupt officials and hold them to account; is there a similar emphasis on finding and rewarding ‘good’ officials? Or of officials using online public opinion to further their own reputations and careers? How cynical is the online public?
David: The online public is very cynical, and getting ever more so (which is seen as a problem by the government as well). The emphasis on ‘bad’ officials is fairly ‘normal’, though, as ‘good’ officials are not ‘newsworthy’. In the Chinese context there is the additional problem that socialist governments like to promote ‘model workers’, ‘model units’, etc. which would make the praising of individual ‘good’ officials by Internet users highly suspect. Other Internet users would simply assume the posters to be paid ‘hidden’ posters for the government or the Party.
Ed: Do you think (on balance) that the Internet has brought more benefits (and power) to the Chinese Government or new problems and worries?
David: I think the Internet has changed many things for many people worldwide. Limiting the debate on the Internet to the dichotomies of government vs Internet, empowered netizens vs disenfranchised Luddites, online power vs wasting time online, etc. is highly problematic. The open engagement with the Internet by government (and Party) authorities has been greater in China than elsewhere; in my view, the Chinese authorities have reacted much faster, and ‘better’ to the Internet than authorities elsewhere. As the so-called ‘revelations’ of the past few months have shown, governments everywhere have tried and are trying to control and use Internet technologies in pursuit of power.
Although I personally would prefer the Internet to be a ‘free’ and ‘independent’ place, I realise that this is a utopian dream given the political and economic benefits and possibilities of the Internet. Given the inevitability of government controls, though, I prefer the open control exercised by Chinese authorities to the hypocrisy of European and American governments, even if the Chinese controls (apparently) exceed those of other governments.
Dr David Herold is an Assistant Professor of Sociology at Hong Kong Polytechnic University, where he researches Chinese culture and contemporary PRC society, China’s relationship with other countries, and Chinese cyberspace and online society. His paper Captive Artists: Chinese University Students Talk about the Internet was presented at the presented at “China and the New Internet World”, International Communication Association (ICA) Preconference, Oxford Internet Institute, University of Oxford, June 2013.
David Herold was talking to blog editor David Sutcliffe.
Ed: European legislation introduced in 2011 requires Member States to ensure the prompt removal of child pornography websites hosted in their territory and to endeavour to obtain the removal of such websites hosted outside; leaving open the option to block access by users within their own territory. What is problematic about this blocking?
Authors: From a technical point of view, all possible blocking methods that could be used by Member States are ineffective as they can all be circumvented very easily. The use of widely available technologies (like encryption or proxy servers) or tiny changes in computer configurations (for instance the choice of DNS-server), that may also be used for better performance or the enhancement of security or privacy, enable circumvention of blocking methods. Another problem arises from the fact that this legislation only targets website content while offenders often use other technologies such as peer-to-peer systems, newsgroups or email.
Ed: Many of these blocking activities stem from European efforts to combat child pornography, but you suggest that child protection may be used as a way to add other types of content to lists of blocked sites – notably those that purportedly violate copyright. Can you explain how this “mission creep” is occurring, and what the risks are?
Authors: Combating child pornography and child abuse is a universal and legitimate concern. With regard to this subject there is a worldwide consensus that action must be undertaken in order to punish abusers and protect children. Blocking measures are usually advocated on the basis of the argument that access to these images must be prevented, hence avoiding that users stumble upon child pornography inadvertently. Whereas this seems reasonable with regard to this particular type of content, in some countries governments increasingly use blocking mechanisms for other ‘illegal’ content, such as gambling or copyright-infringing content, often in a very non-transparent way, without clear or established procedures.
It is, in our view, especially important at a time when governments do not hesitate to carry out secret online surveillance of citizens without any transparency or accountability, that any interference with online content must be clearly prescribed by law, have a legitimate aim and, most importantly, be proportional and not go beyond what is necessary to achieve that aim. In addition, the role of private actors, such as ISPs, search engine companies or social networks, must be very carefully considered. It must be clear that decisions about which content or behaviours are illegal and/or harmful must be taken or at least be surveyed by the judicial power in a democratic society.
Ed: You suggest that removal of websites at their source (mostly in the US and Canada) is a more effective means of stopping the distribution of child pornography — but that European law enforcement has often been insufficiently committed to such action. Why is this? And how easy are cross-jurisdictional efforts to tackle this sort of content?
Authors: The blocking of websites is, although questionably ineffective as a method of making the content inaccessible, a quick way to be seen to take action against the appearance of unwanted material on the Internet. The removal of content on the other hand requires not only the identification of those responsible for hosting the content but more importantly the actual perpetrators. This is of course a more intrusive and lengthy process, for which law enforcement agencies currently lack resources.
Moreover, these agencies may indeed run into obstacles related to territorial jurisdiction and difficult international cooperation. However, prioritising and investing in actual removal of content, even though not feasible in certain circumstances, will ensure that child sexual abuse images do not further circulate, and, hence, that the risk of repetitive re-victimization of abused children is reduced.