Wikipedia

We might expect bot interactions to be relatively predictable and uneventful.

Wikipedia uses editing bots to clean articles: but what happens when their interactions go bad? Image of "Nomade", a sculpture in downtown Des Moines by Jason Mrachina (Flickr CC BY-NC-ND 2.0).

Recent years have seen a huge increase in the number of bots online—including search engine Web crawlers, online customer service chat bots, social media spambots, and content-editing bots in online collaborative communities like Wikipedia. (Bots are important contributors to Wikipedia, completing about 15% of all Wikipedia edits in 2014 overall, and more than 50% in certain language editions.) While the online world has turned into an ecosystem of bots (by which we mean computer scripts that automatically handle repetitive and mundane tasks), our knowledge of how these automated agents interact with each other is rather poor. But being automata without capacity for emotions, meaning-making, creativity, or sociality, we might expect bot interactions to be relatively predictable and uneventful. In their PLOS ONE article “Even good bots fight: The case of Wikipedia”, Milena Tsvetkova, Ruth García-Gavilanes, Luciano Floridi, and Taha Yasseri analyse the interactions between bots that edit articles on Wikipedia. They track the extent to which bots undid each other’s edits over the period 2001–2010, model how pairs of bots interact over time, and identify different types of interaction outcomes. Although Wikipedia bots are intended to support the encyclopaedia—identifying and undoing vandalism, enforcing bans, checking spelling, creating inter-language links, importing content automatically, mining data, identifying copyright violations, greeting newcomers, etc.—the authors find they often undid each other’s edits, with these sterile “fights” sometimes continuing for years. They suggest that even relatively “dumb” bots may give rise to complex interactions, carrying important implications for Artificial Intelligence research. Understanding these bot-bot interactions will be crucial for managing social media, providing adequate cyber-security, and designing autonomous vehicles (that don’t crash). We caught up with Taha Yasseri and Luciano Floridi to discuss the implications of the findings: Ed.: Is there any particular difference between the way individual bots interact (and maybe get bogged down in conflict), and lines of vast and complex code interacting badly, or having unforeseen results (e.g. flash-crashes in automated trading):…

Data show that the relative change in page views to the general Wikipedia page on the election can offer an estimate of the relative change in election turnout.

2016 presidential candidate Donald Trump in a residential backyard near Jordan Creek Parkway and Cody Drive in West Des Moines, Iowa, with lights and security cameras. Image by Tony Webster (Flickr).

As digital technologies become increasingly integrated into the fabric of social life their ability to generate large amounts of information about the opinions and activities of the population increases. The opportunities in this area are enormous: predictions based on socially generated data are much cheaper than conventional opinion polling, offer the potential to avoid classic biases inherent in asking people to report their opinions and behaviour, and can deliver results much quicker and be updated more rapidly. In their article published in EPJ Data Science, Taha Yasseri and Jonathan Bright develop a theoretically informed prediction of election results from socially generated data combined with an understanding of the social processes through which the data are generated. They can thereby explore the predictive power of socially generated data while enhancing theory about the relationship between socially generated data and real world outcomes. Their particular focus is on the readership statistics of politically relevant Wikipedia articles (such as those of individual political parties) in the time period just before an election. By applying these methods to a variety of different European countries in the context of the 2009 and 2014 European Parliament elections they firstly show that the relative change in number of page views to the general Wikipedia page on the election can offer a reasonable estimate of the relative change in election turnout at the country level. This supports the idea that increases in online information seeking at election time are driven by voters who are considering voting. Second, they show that a theoretically informed model based on previous national results, Wikipedia page views, news media mentions, and basic information about the political party in question can offer a good prediction of the overall vote share of the party in question. Third, they present a model for predicting change in vote share (i.e., voters swinging towards and away from a party), showing that Wikipedia page-view data provide an important increase…

The geography of knowledge has always been uneven. Some people and places have always been more visible and had more voices than others.

Reposted from The Conversation. The geography of knowledge has always been uneven. Some people and places have always been more visible and had more voices than others. But the internet seemed to promise something different: a greater diversity of voices, opinions and narratives from more places. Unfortunately, this has not come to pass in quite the manner some expected it to. Many parts of the world remain invisible or under-represented on important websites and services. All of this matters because as geographic information becomes increasingly integral to our lives, places that are not represented on platforms like Wikipedia will be absent from many of our understandings of, and interactions with, the world. Mapping the differences Until now, there has been no large-scale analysis of the factors that explain the wide geographical spread of online information. This is something we have aimed to address in our research project on the geography of Wikipedia. Our focus areas were the Middle East and North Africa. Using statistical models of geotagged Wikipedia data, we identified the necessary conditions to make countries “visible”. This allowed us to map the countries that fare considerably better or worse than expected. We found that a large part of the variation between countries could be explained by just three factors: population, availability of broadband internet, and the number of edits originating in that country. Areas of Wikipedia hegemony and uneven geographic coverage. Oxford Internet Institute While these three variables help to explain the sparse amount of content written about much of sub-Saharan Africa, most of the Middle East and North Africa have much less geographic information than might be expected. For example, despite high levels of wealth and connectivity, Qatar and the United Arab Emirates have far fewer articles than we might expect. Constraints to creating content These three factors matter independently, but they will also be subject to other constraints. A country’s population will probably affect the number of activities, places, and practices…

As geographic content and geospatial information becomes increasingly integral to our everyday lives, places that are left off the ‘map of knowledge’ will be absent from our understanding of the world.

The geographies of codified knowledge have always been uneven, affording some people and places greater voice and visibility than others. While the rise of the geosocial Web seemed to promise a greater diversity of voices, opinions, and narratives about places, many regions remain largely absent from the websites and services that represent them to the rest of the world. These highly uneven geographies of codified information matter because they shape what is known and what can be known. As geographic content and geospatial information becomes increasingly integral to our everyday lives, places that are left off the ‘map of knowledge’ will be absent from our understanding of, and interaction with, the world. We know that Wikipedia is important to the construction of geographical imaginations of place, and that it has immense power to augment our spatial understandings and interactions (Graham et al. 2013). In other words, the presences and absences in Wikipedia matter. If a person’s primary free source of information about the world is the Persian or Arabic or Hebrew Wikipedia, then the world will look fundamentally different from the world presented through the lens of the English Wikipedia. The capacity to represent oneself to outsiders is especially important in those parts of the world that are characterised by highly uneven power relationships: Brunn and Wilson (2013) and Graham and Zook (2013) have already demonstrated the power of geospatial content to reinforce power in a South African township and Jerusalem, respectively. Until now, there has been no large-scale empirical analysis of the factors that explain information geographies at the global scale; this is something we have aimed to address in this research project on Mapping and measuring local knowledge production and representation in the Middle East and North Africa. Using regression models of geolocated Wikipedia data we have identified what are likely to be the necessary conditions for representation at the country level, and have also identified the outliers,…

Wikipedia is often seen as a great equaliser. But it’s starting to look like global coverage on Wikipedia is far from equal.

Reposted from The Conversation. Wikipedia is often seen as a great equaliser. Every day, hundreds of thousands of people collaborate on a seemingly endless range of topics by writing, editing and discussing articles, and uploading images and video content. But it’s starting to look like global coverage on Wikipedia is far from equal. This now ubiquitous source of information offers everything you could want to know about the US and Europe but far less about any other parts of the world. This structural openness of Wikipedia is one of its biggest strengths. Academic and activist Lawrence Lessig even describes the online encyclopedia as “a technology to equalise the opportunity that people have to access and participate in the construction of knowledge and culture, regardless of their geographic placing”. But despite Wikipedia’s openness, there are fears that the platform is simply reproducing the most established worldviews. Knowledge created in the developed world appears to be growing at the expense of viewpoints coming from developing countries. Indeed, there are indications that global coverage in the encyclopedia is far from “equal”, with some parts of the world heavily represented on the platform, and others largely left out. For a start, if you look at articles published about specific places such as monuments, buildings, festivals, battlefields, countries, or mountains, the imbalance is striking. Europe and North America account for a staggering 84% of these “geotagged” articles. Almost all of Africa is poorly represented in the encyclopedia, too. In fact, there are more Wikipedia articles written about Antarctica (14,959) than any country in Africa. And while there are just over 94,000 geotagged articles related to Japan, there are only 88,342 on the entire Middle East and North Africa region. Total number of geotagged Wikipedia articles across 44 surveyed languages. Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming). When…

Negotiating the wider politics of Wikipedia can be a daunting task, particularly when in it comes to content about the MENA region.

Negotiating the wider politics of Wikipedia can be a daunting task, particularly when in it comes to content about the MENA region. Image of the Dome of the Rock (Qubbat As-Sakhrah), Jerusalem, by 1yen

Wikipedia has famously been described as a project that “ works great in practice and terrible in theory”. One of the ways in which it succeeds is through its extensive consensus-based governance structure. While this has led to spectacular success—over 4.5 million articles in the English Wikipedia alone—the governance structure is neither obvious nor immediately accessible, and can present a barrier for those seeking entry. Editing Wikipedia can be a tough challenge—an often draining and frustrating task, involving heated disputes and arguments where it is often the most tenacious, belligerent, or connected editor who wins out in the end. Broadband access and literacy are not the only pre-conditions for editing Wikipedia; ‘digital literacy’ is also crucial. This includes the ability to obtain and critically evaluate online sources, locate Wikipedia’s editorial and governance policies, master Wiki syntax, and confidently articulate and assert one’s views about an article or topic. Experienced editors know how to negotiate the rules, build a consensus with some editors to block others, and how to influence administrators during dispute resolution. This strict adherence to the word (if not the spirit) of Wikipedia’s ‘law’ can lead to marginalization or exclusion of particular content, particularly when editors are scared off by unruly mobs who ‘weaponise’ policies to fit a specific agenda. Governing such a vast collaborative platform as Wikipedia obviously presents a difficult balancing act between being open enough to attract volume of contributions, and moderated enough to ensure their quality. Many editors consider Wikipedia’s governance structure (which varies significantly between the different language versions) essential to ensuring the quality of its content, even if it means that certain editors can (for example) arbitrarily ban other users, lock down certain articles, and exclude moderate points of view. One of the editors we spoke to noted that: “A number of articles I have edited with quality sources, have been subjected to editors cutting information that doesn’t fit their ideas […]…

There are more Wikipedia articles in English than Arabic about almost every Arabic speaking country in the Middle East.

Image of rock paintings in the Tadrart Acacus region of Libya by Luca Galuzzi.

Wikipedia is often seen to be both an enabler and an equaliser. Every day hundreds of thousands of people collaborate on an (encyclopaedic) range of topics; writing, editing and discussing articles, and uploading images and video content. This structural openness combined with Wikipedia’s tremendous visibility has led some commentators to highlight it as “a technology to equalise the opportunity that people have to access and participate in the construction of knowledge and culture, regardless of their geographic placing” (Lessig 2003). However, despite Wikipedia’s openness, there are also fears that the platform is simply reproducing worldviews and knowledge created in the Global North at the expense of Southern viewpoints (Graham 2011; Ford 2011). Indeed, there are indications that global coverage in the encyclopaedia is far from ‘equal’, with some parts of the world heavily represented on the platform, and others largely left out (Hecht and Gergle 2009; Graham 2011, 2013, 2014). These second-generation digital divides are not merely divides of Internet access (so discussed in the late 1990s), but gaps in representation and participation (Hargittai and Walejko 2008). Whereas most Wikipedia articles written about most European and East Asian countries are written in their dominant languages, for much of the Global South we see a dominance of articles written in English. These geographic differences in the coverage of different language versions of Wikipedia matter, because fundamentally different narratives can be (and are) created about places and topics in different languages (Graham and Zook 2013; Graham 2014). If we undertake a ‘global analysis’ of this pattern by examining the number of geocoded articles (ie about a specific place) across Wikipedia’s main language versions (Figure 1), the first thing we can observe is the incredible human effort that has gone into describing ‘place’ in Wikipedia. The second is the clear and highly uneven geography of information, with Europe and North America home to 84% of all geolocated articles. Almost all of Africa is…

Arabic is one of the least represented major world languages on Wikipedia: few languages have more speakers and fewer articles than Arabic.

Image of the Umayyad Mosque (Damascus) by Travel Aficionado

Wikipedia currently contains over 9 million articles in 272 languages, far surpassing any other publicly available information repository. Being the first point of contact for most general topics (therefore an effective site for framing any subsequent representations) it is an important platform from which we can learn whether the Internet facilitates increased open participation across cultures—or reinforces existing global hierarchies and power dynamics. Because the underlying political, geographic and social structures of Wikipedia are hidden from users, and because there have not been any large scale studies of the geography of these structures and their relationship to online participation, entire groups of people (and regions) may be marginalised without their knowledge. This process is important to understand, for the simple reason that Wikipedia content has begun to form a central part of services offered elsewhere on the Internet. When you look for information about a place on Facebook, the description of that place (including its geographic coordinates) comes from Wikipedia. If you want to “check in” to a museum in Doha to signify you were there to their friends, the place you check in to was created with Wikipedia data. When you Google “House of Saud” you are presented not only with a list of links (with Wikipedia at the top) but also with a special ‘card’ summarising the House. This data comes from Wikipedia. When you look for people or places, Google now has these terms inside its ‘knowledge graph’, a network of related concepts with data coming directly from Wikipedia. Similarly, on Google maps, Wikipedia descriptions for landmarks are presented as part of the default information. Ironically, Wikipedia editorship is actually on a slow and steady decline, even as its content and readership increases year on year. Since 2007 and the introduction of significant devolution of administrative powers to volunteers, Wikipedia has not been able to effectively retain newcomers, something which has been noted as a concern by…

Although some topics are globally debated, like religion and politics, there are many topics which are controversial only in a single language edition. This reflects the local preferences and importances assigned to topics by different editorial communities.

Ed: How did you construct your quantitative measure of ‘conflict’? Did you go beyond just looking at content flagged by editors as controversial? Taha: Yes we did. Actually, we have shown that controversy measures based on “controversial” flags are not inclusive at all and although they might have high precision, they have very low recall. Instead, we constructed an automated algorithm to locate and quantify the editorial wars taking place on the Wikipedia platform. Our algorithm is based on reversions, i.e. when editors undo each other’s contributions. We focused specifically on mutual reverts between pairs of editors and we assigned a maturity score to each editor, based on the total volume of their previous contributions. While counting the mutual reverts, we used more weight for those ones committed by/on editors with higher maturity scores; as a revert between two experienced editors indicates a more serious problem. We always validated our method and compared it with other methods, using human judgement on a random selection of articles. Ed: Was there any discrepancy between the content deemed controversial by your own quantitative measure, and what the editors themselves had flagged? Taha: We were able to capture all the flagged content, but not all the articles found to be controversial by our method are flagged. And when you check the editorial history of those articles, you soon realise that they are indeed controversial but for some reason have not been flagged. It’s worth mentioning that the flagging process is not very well implemented in smaller language editions of Wikipedia. Even if the controversy is detected and flagged in English Wikipedia, it might not be in the smaller language editions. Our model is of course independent of the size and editorial conventions of different language editions. Ed: Were there any differences in the way conflicts arose/were resolved in the different language versions? Taha: We found the main differences to be the topics of controversial…