Articles

Wikipedia is often seen as a great equaliser. But it’s starting to look like global coverage on Wikipedia is far from equal.

Reposted from The Conversation. Wikipedia is often seen as a great equaliser. Every day, hundreds of thousands of people collaborate on a seemingly endless range of topics by writing, editing and discussing articles, and uploading images and video content. But it’s starting to look like global coverage on Wikipedia is far from equal. This now ubiquitous source of information offers everything you could want to know about the US and Europe but far less about any other parts of the world. This structural openness of Wikipedia is one of its biggest strengths. Academic and activist Lawrence Lessig even describes the online encyclopedia as “a technology to equalise the opportunity that people have to access and participate in the construction of knowledge and culture, regardless of their geographic placing”. But despite Wikipedia’s openness, there are fears that the platform is simply reproducing the most established worldviews. Knowledge created in the developed world appears to be growing at the expense of viewpoints coming from developing countries. Indeed, there are indications that global coverage in the encyclopedia is far from “equal”, with some parts of the world heavily represented on the platform, and others largely left out. For a start, if you look at articles published about specific places such as monuments, buildings, festivals, battlefields, countries, or mountains, the imbalance is striking. Europe and North America account for a staggering 84% of these “geotagged” articles. Almost all of Africa is poorly represented in the encyclopedia, too. In fact, there are more Wikipedia articles written about Antarctica (14,959) than any country in Africa. And while there are just over 94,000 geotagged articles related to Japan, there are only 88,342 on the entire Middle East and North Africa region. Total number of geotagged Wikipedia articles across 44 surveyed languages. Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming). When…

So are young people completely unconcerned about their privacy online, gaily granting access to everything to everyone? Well, in a word, no.

A pretty good idea of what not to do on a social media site. Image by Sean MacEntee. Standing on a stage in San Francisco in early 2010, Facebook founder Mark Zuckerberg, partly responding to the site’s decision to change the privacy settings of its 350 million users, announced that as Internet users had become more comfortable sharing information online, privacy was no longer a “social norm”. Of course, he had an obvious commercial interest in relaxing norms surrounding online privacy, but this attitude has nevertheless been widely echoed in the popular media. Young people are supposed to be sharing their private lives online—and providing huge amounts of data for commercial and government entities—because they don’t fully understand the implications of the public nature of the Internet. There has actually been little systematic research on the privacy behaviour of different age groups in online settings. But there is certainly evidence of a growing (general) concern about online privacy (Marwick et al., 2010), with a 2013 Pew study finding that 50 percent of Internet users were worried about the information available about them online, up from 30 percent in 2009. Following the recent revelations about the NSA’s surveillance activities, a Washington Post-ABC poll reported 40 percent of its U.S. respondents as saying that it was more important to protect citizens’ privacy even if it limited the ability of the government to investigate terrorist threats. But what of young people, specifically? Do they really care less about their online privacy than older users? Privacy concerns an individual’s ability to control what personal information about them is disclosed, to whom, when, and under what circumstances. We present different versions of ourselves to different audiences, and the expectations and norms of the particular audience (or context) will determine what personal information is presented or kept hidden. This highlights a fundamental problem with privacy in some SNSs: that of ‘context collapse’ (Marwick and boyd 2011).…

Informing the global discussions on information control research and practice in the fields of censorship, circumvention, surveillance and adherence to human rights.

Jon Penny presenting on the US experience of Internet-related corporate transparency reporting.

根据相关法律法规和政策,部分搜索结果未予显示 could be a warning message we will see displayed more often on the Internet; but likely translations thereof. In Chinese, this means “according to the relevant laws, regulations, and policies, a portion of search results have not been displayed.” The control of information flows on the Internet is becoming more commonplace, in authoritarian regimes as well as in liberal democracies, either via technical or regulatory means. Such information controls can be defined as “[…] actions conducted in or through information and communications technologies (ICTs), which seek to deny (such as web filtering), disrupt (such as denial-of-service attacks), shape (such as throttling), secure (such as through encryption or circumvention) or monitor (such as passive or targeted surveillance) information for political ends. Information controls can also be non-technical and can be implemented through legal and regulatory frameworks, including informal pressures placed on private companies. […]” Information controls are not intrinsically good or bad, but much is to be explored and analysed about their use, for political or commercial purposes. The University of Toronto’s Citizen Lab organised a one-week summer institute titled “Monitoring Internet Openness and Rights” to inform the global discussions on information control research and practice in the fields of censorship, circumvention, surveillance and adherence to human rights. A week full of presentations and workshops on the intersection of technical tools, social science research, ethical and legal reflections and policy implications was attended by a distinguished group of about 60 community members, amongst whom were two OII DPhil students; Jon Penney and Ben Zevenbergen. Conducting Internet measurements may be considered to be a terra incognita in terms of methodology and data collection, but the relevance and impacts for Internet policy-making, geopolitics or network management are obvious and undisputed. The Citizen Lab prides itself in being a “hacker hothouse”, or an “intelligence agency for civil society” where security expertise, politics, and ethics intersect. Their research adds the much-needed geopolitical angle to…

Negotiating the wider politics of Wikipedia can be a daunting task, particularly when in it comes to content about the MENA region.

Negotiating the wider politics of Wikipedia can be a daunting task, particularly when in it comes to content about the MENA region. Image of the Dome of the Rock (Qubbat As-Sakhrah), Jerusalem, by 1yen

Wikipedia has famously been described as a project that “ works great in practice and terrible in theory”. One of the ways in which it succeeds is through its extensive consensus-based governance structure. While this has led to spectacular success—over 4.5 million articles in the English Wikipedia alone—the governance structure is neither obvious nor immediately accessible, and can present a barrier for those seeking entry. Editing Wikipedia can be a tough challenge—an often draining and frustrating task, involving heated disputes and arguments where it is often the most tenacious, belligerent, or connected editor who wins out in the end. Broadband access and literacy are not the only pre-conditions for editing Wikipedia; ‘digital literacy’ is also crucial. This includes the ability to obtain and critically evaluate online sources, locate Wikipedia’s editorial and governance policies, master Wiki syntax, and confidently articulate and assert one’s views about an article or topic. Experienced editors know how to negotiate the rules, build a consensus with some editors to block others, and how to influence administrators during dispute resolution. This strict adherence to the word (if not the spirit) of Wikipedia’s ‘law’ can lead to marginalization or exclusion of particular content, particularly when editors are scared off by unruly mobs who ‘weaponise’ policies to fit a specific agenda. Governing such a vast collaborative platform as Wikipedia obviously presents a difficult balancing act between being open enough to attract volume of contributions, and moderated enough to ensure their quality. Many editors consider Wikipedia’s governance structure (which varies significantly between the different language versions) essential to ensuring the quality of its content, even if it means that certain editors can (for example) arbitrarily ban other users, lock down certain articles, and exclude moderate points of view. One of the editors we spoke to noted that: “A number of articles I have edited with quality sources, have been subjected to editors cutting information that doesn’t fit their ideas […]…

If we only undertake research on the nature or extent of risk, then it’s difficult to learn anything useful about who is harmed, and what this means for their lives.

The range of academic literature analysing the risks and opportunities of Internet use for children has grown substantially in the past decade, but there’s still surprisingly little empirical evidence on how perceived risks translate into actual harms. Image by Brad Flickinger

Child Internet safety is a topic that continues to gain a great deal of media coverage and policy attention. Recent UK policy initiatives such as Active Choice Plus in which major UK broadband providers agreed to provide household-level filtering options, or the industry-led Internet Matters portal, reflect a public concern with the potential risks and harms of children’s Internet use. At the same time, the range of academic literature analysing the risks and opportunities of Internet use for children has grown substantially in the past decade, in large part due to the extensive international studies funded by the European Commission as part of the excellent EU Kids Online network. Whilst this has greatly helped us understand how children behave online, there’s still surprisingly little empirical evidence on how perceived risks translate into actual harms. This is a problematic, first, because risks can only be identified if we understand what types of harms we wish to avoid, and second, because if we only undertake research on the nature or extent of risk, then it’s difficult to learn anything useful about who is harmed, and what this means for their lives. Of course, the focus on risk rather than harm is understandable from an ethical and methodological perspective. It wouldn’t be ethical, for example, to conduct a trial in which one group of children was deliberately exposed to very violent or sexual content to observe whether any harms resulted. Similarly, surveys can ask respondents to self-report harms experienced online, perhaps through the lens of upsetting images or experiences. But again, there are ethical concerns about adding to children’s distress by questioning them extensively on difficult experiences, and in a survey context it’s also difficult to avoid imposing adult conceptions of ‘harm’ through the wording of the questions. Despite these difficulties, there are many research projects that aim to measure and understand the relationship between various types of physical, emotional or psychological harm…

There are more Wikipedia articles in English than Arabic about almost every Arabic speaking country in the Middle East.

Image of rock paintings in the Tadrart Acacus region of Libya by Luca Galuzzi.

Wikipedia is often seen to be both an enabler and an equaliser. Every day hundreds of thousands of people collaborate on an (encyclopaedic) range of topics; writing, editing and discussing articles, and uploading images and video content. This structural openness combined with Wikipedia’s tremendous visibility has led some commentators to highlight it as “a technology to equalise the opportunity that people have to access and participate in the construction of knowledge and culture, regardless of their geographic placing” (Lessig 2003). However, despite Wikipedia’s openness, there are also fears that the platform is simply reproducing worldviews and knowledge created in the Global North at the expense of Southern viewpoints (Graham 2011; Ford 2011). Indeed, there are indications that global coverage in the encyclopaedia is far from ‘equal’, with some parts of the world heavily represented on the platform, and others largely left out (Hecht and Gergle 2009; Graham 2011, 2013, 2014). These second-generation digital divides are not merely divides of Internet access (so discussed in the late 1990s), but gaps in representation and participation (Hargittai and Walejko 2008). Whereas most Wikipedia articles written about most European and East Asian countries are written in their dominant languages, for much of the Global South we see a dominance of articles written in English. These geographic differences in the coverage of different language versions of Wikipedia matter, because fundamentally different narratives can be (and are) created about places and topics in different languages (Graham and Zook 2013; Graham 2014). If we undertake a ‘global analysis’ of this pattern by examining the number of geocoded articles (ie about a specific place) across Wikipedia’s main language versions (Figure 1), the first thing we can observe is the incredible human effort that has gone into describing ‘place’ in Wikipedia. The second is the clear and highly uneven geography of information, with Europe and North America home to 84% of all geolocated articles. Almost all of Africa is…

Arabic is one of the least represented major world languages on Wikipedia: few languages have more speakers and fewer articles than Arabic.

Image of the Umayyad Mosque (Damascus) by Travel Aficionado

Wikipedia currently contains over 9 million articles in 272 languages, far surpassing any other publicly available information repository. Being the first point of contact for most general topics (therefore an effective site for framing any subsequent representations) it is an important platform from which we can learn whether the Internet facilitates increased open participation across cultures—or reinforces existing global hierarchies and power dynamics. Because the underlying political, geographic and social structures of Wikipedia are hidden from users, and because there have not been any large scale studies of the geography of these structures and their relationship to online participation, entire groups of people (and regions) may be marginalised without their knowledge. This process is important to understand, for the simple reason that Wikipedia content has begun to form a central part of services offered elsewhere on the Internet. When you look for information about a place on Facebook, the description of that place (including its geographic coordinates) comes from Wikipedia. If you want to “check in” to a museum in Doha to signify you were there to their friends, the place you check in to was created with Wikipedia data. When you Google “House of Saud” you are presented not only with a list of links (with Wikipedia at the top) but also with a special ‘card’ summarising the House. This data comes from Wikipedia. When you look for people or places, Google now has these terms inside its ‘knowledge graph’, a network of related concepts with data coming directly from Wikipedia. Similarly, on Google maps, Wikipedia descriptions for landmarks are presented as part of the default information. Ironically, Wikipedia editorship is actually on a slow and steady decline, even as its content and readership increases year on year. Since 2007 and the introduction of significant devolution of administrative powers to volunteers, Wikipedia has not been able to effectively retain newcomers, something which has been noted as a concern by…

Understanding these economies is therefore crucial to anyone who is interested in the social dynamics and power relations of digital media today.

Vili discusses his new book from MIT Press (with E.Castronova): Virtual Economies: Design and Analysis.

Digital gaming, once a stigmatised hobby, is now a mainstream cultural activity. According to the Oxford Internet Survey, more than half of British Internet users play games online; more in fact, than watch films or pornography online. Most new games today contain some kind of a virtual economy: that is, a set of processes for the production, allocation, and consumption of artificially scarce virtual goods. Often the virtual economy is very simple; sometimes, as in massively multiplayer online game EVE Online, it starts to approach the scale and complexity of a small national economy. Just like national economies, virtual economies incentivise certain behaviours and discourage others; they ask people to make choices between mutually exclusive options; they ask people to coordinate. They can also propagate value systems setting out what modes of participation are considered valuable. These virtual economies are now built into many of the most popular areas of the Internet, including social media sites and knowledge commons—with their systems of artificially scarce likes, stars, votes, and badges. Understanding these economies is therefore crucial to anyone who is interested in the social dynamics and power relations of digital media today. But a question I am asked a lot is: what can ‘real’ economies and the economists who run them learn from these virtual economies? We might start by imagining how a textbook economist would approach the economy of an online game. In EVE Online, hundreds of thousands of players trade minerals, spaceship components and other virtual commodities on a number of regional marketplaces. These marketplaces are very sophisticated, resembling real commodity spot markets. Our economist would doubtless point out several ways its efficiency could be radically improved. For example, EVE players can only see prices quoted in their current region, likely missing a better deal available elsewhere. (In physical commodity markets, prices are instantly broadcast worldwide: you wouldn’t pay more for gold in Tokyo than you would in New…

What’s new about companies and academic researchers doing this kind of research to manipulate peoples’ behaviour?

Reports about the Facebook study ‘Experimental evidence of massive-scale emotional contagion through social networks’ have resulted in something of a media storm. Yet it can be predicted that ultimately this debate will result in the question: so what’s new about companies and academic researchers doing this kind of research to manipulate peoples’ behaviour? Isn’t that what a lot of advertising and marketing research does already—changing peoples’ minds about things? And don’t researchers sometimes deceive subjects in experiments about their behaviour? What’s new? This way of thinking about the study has a serious defect, because there are three issues raised by this research: The first is the legality of the study, which, as the authors correctly point out, falls within Facebook users’ giving informed consent when they sign up to the service. Laws or regulation may be required here to prevent this kind of manipulation, but may also be difficult, since it will be hard to draw a line between this experiment and other forms of manipulating peoples’ responses to media. However, Facebook may not want to lose users, for whom this way of manipulating them via their service may ‘cause anxiety’ (as the first author of the study, Adam Kramer, acknowledged in a blog post response to the outcry). In short, it may be bad for business, and hence Facebook may abandon this kind of research (but we’ll come back to this later). But this—companies using techniques that users don’t like, so they are forced to change course—is not new. The second issue is academic research ethics. This study was carried out by two academic researchers (the other two authors of the study). In retrospect, it is hard to see how this study would have received approval from an institutional review board (IRB), the boards at which academic institutions check the ethics of studies. Perhaps stricter guidelines are needed here since a) big data research is becoming much more prominent…

Without detailed information about small areas we can’t identify where would benefit most from policy intervention to encourage Internet use and improve access.

Britain has one of the largest Internet economies in the industrial world. The Internet contributes an estimated 8.3% to Britain’s GDP (Dean et al. 2012), and strongly supports domestic job and income growth by enabling access to new customers, markets and ideas. People benefit from better communications, and businesses are more likely to locate in areas with good digital access, thereby boosting local economies (Malecki & Moriset 2008). While the Internet brings clear benefits, there is also a marked inequality in its uptake and use (the so-called ‘digital divide’). We already know from the Oxford Internet Surveys (OxIS) that Internet use in Britain is strongly stratified by age, by income and by education; and yet we know almost nothing about local patterns of Internet use across the country. A problem with national sample surveys (the usual source of data about Internet use and non-use), is that the sample sizes become too small to allow accurate generalisation at smaller, sub-national areas. No one knows, for example, the proportion of Internet users in Glasgow, because national surveys simply won’t have enough respondents to make reliable city-level estimates. We know that Internet use is not evenly distributed at the regional level; Ofcom reports on broadband speeds and penetration at the county level (Ofcom 2011), and we know that London and the southeast are the most wired part of the country (Dean et al. 2012). But given the importance of the Internet, the lack of knowledge about local patterns of access and use in Britain is surprising. This is a problem because without detailed information about small areas we can’t identify where would benefit most from policy intervention to encourage Internet use and improve access. We have begun to address this lack of information by combining two important but separate datasets—the 2011 national census, and the 2013 OxIS surveys—using the technique of small area estimation. By definition, census data are available for very small…