The sum of (some) human knowledge: Wikipedia and representation in the Arab World

Arabic is one of the least represented major world languages on Wikipedia: few languages have more speakers and fewer articles than Arabic. Image of the Umayyad Mosque (Damascus) by Travel Aficionado

Wikipedia currently contains over 9 million articles in 272 languages, far surpassing any other publicly available information repository. Being the first point of contact for most general topics (therefore an effective site for framing any subsequent representations) it is an important platform from which we can learn whether the Internet facilitates increased open participation across cultures — or reinforces existing global hierarchies and power dynamics. Because the underlying political, geographic and social structures of Wikipedia are hidden from users, and because there have not been any large scale studies of the geography of these structures and their relationship to online participation, entire groups of people (and regions) may be marginalized without their knowledge.

This process is important to understand, for the simple reason that Wikipedia content has begun to form a central part of services offered elsewhere on the Internet. When you look for information about a place on Facebook, the description of that place (including its geographic coordinates) comes from Wikipedia. If you want to “check in” to a museum in Doha to signify you were there to their friends, the place you check in to was created with Wikipedia data. When you Google “House of Saud” you are presented not only with a list of links (with Wikipedia at the top) but also with a special ‘card’ summarising the House. This data comes from Wikipedia. When you look for people or places, Google now has these terms inside its ‘knowledge graph’, a network of related concepts with data coming directly from Wikipedia. Similarly, on Google maps, Wikipedia descriptions for landmarks are presented as part of the default information.

Ironically, Wikipedia editorship is actually on a slow and steady decline, even as its content and readership increases year on year. Since 2007 and the introduction of significant devolution of administrative powers to volunteers, Wikipedia has not been able to effectively retain newcomers, something which has been noted as a concern by many at the Wikimedia Foundation. Some think Wikipedia might be levelling off because there’s only so much to write about. This is extremely far from the truth; there are still substantial gaps in geographic content in English and overwhelming gaps in other languages. Wikipedia often brands itself as aspiring to contain “the sum of human knowledge”, but behind this mantra lie policy pitfalls, tedious editor debates and delicate sourcing issues that hamper greater representation of the region. Of course these challenges form part of Wikipedia’s continuing evolution as the de facto source for online reference information, but they also (disturbingly) act to entrench particular ways of “knowing” — and ways of validating what is known.

There are over 260,000 articles in Arabic, receiving 240,000 views per hour. This actually translates as one of the least represented major world languages on Wikipedia: few languages have more speakers and fewer articles than Arabic. This relative lack of MENA voice and representation means that the tone and content of this globally useful resource, in many cases, is being determined by outsiders with a potential misunderstanding of the significance of local events, sites of interest and historical figures. In an area that has seen substantial social conflict and political upheaval, greater participation from local actors would help to ensure balance in content about contentious issues. Unfortunately, most research on MENA’s Internet presence has so far been drawn from anecdotal evidence, and no comprehensive studies currently exist.

In this project we wanted to understand where place-based content comes from, to explain reasons for the relative lack of Wikipedia articles in Arabic and about the MENA region, and to understand which parts of the region are particularly underrepresented. We also wanted to understand the relationship between Wikipedia’s administrative structure and the treatment of new editors; in particular, we wanted to know whether editors from the MENA region have less of a voice than their counterparts from elsewhere, and whether the content they create is considered more or less legitimate, as measured through the number of reverts; ie the overriding of their work by other editors.

Our practical objectives involved a consolidation of Middle Eastern Wikipedians though a number of workshops focusing on how to create more equitable and representative content, with the ultimate goal of making Wikipedia a more generative and productive site for reference information about the region. Capacity building among key Wikipedians can create greater understanding of barriers to participation and representation and offset much of the (often considerable) emotional labour required to sustain activity on the site in the face of intense arguments and ideological biases. Potential systematic structures of exclusion that could be a barrier to participation include such competitive practices as content deletion, indifference to content produced by MENA authors, and marginalization through bullying and dismissal.

However, a distinct lack of sources — owing both to a lack of legitimacy for MENA journalism and a paucity of open access government documents — is also inhibiting further growth of content about the region. When inclusion of a topic is contested by editors it is typically because there is not enough external source material about it to establish “notability”. As Ford (2011) has already discussed, notability is often culturally mediated. For example, a story in Al Jazeera would not have been considered a sufficient criterion of notability a couple of years ago. However, this has changed dramatically since its central role in reporting on the Arab Spring.

Unfortunately, notability can create a feedback loop. If an area of the world is underreported, there are no sources. If there are no sources, then journalists do not always have enough information to report about that part of the world. ‘Correct’ sourcing trumps personal experience on Wikipedia; even if an author is from a place, and is watching a building being destroyed, their Wikipedia edit will not be accepted by the community unless the event is discussed in another ‘official’ medium. Often the edit will either be branded with a ‘citation needed’ tag, eliminated, or discussed in the talk page. Particularly aggressive editors and administrators will nominate the page for ‘speedy deletion’ (ie deletion without discussion), a practice that makes responses from an author difficult

Why does any of this matter in practical terms? For the simple reason that biases, absences and contestations on Wikipedia spill over into numerous other domains that are in regular and everyday use (Graham and Zook, 2013). If a place is not on Wikipedia, this might have a chilling effect on business and stifle journalism; if a place is represented poorly on Wikipedia this can lead to misunderstandings about the place. Wikipedia is not a legislative body. However, in the court of public opinion, Wikipedia represents one of the world’s strongest forces, as it quietly inserts itself into representations of place worldwide (Graham et. al 2013; Graham 2013).

Wikipedia is not merely a site of reference information, but is rapidly becoming the de facto site for representing the world to itself. We need to understand more about that representation.

Further Reading

Allagui, I., Graham, M., and Hogan, B. 2014. Wikipedia Arabe et la Construction Collective du Savoir In Wikipedia, objet scientifique non identifie. eds. Barbe, L., and Merzeau, L. Paris: Presses Universitaries du Paris Ouest (in press).

Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).

Graham, M. 2012. Die Welt in Der Wikipedia Als Politik der Exklusion: Palimpseste des Ortes und selective Darstellung. In Wikipedia. eds. S. Lampe, and P. Bäumer. Bundeszentrale für politische Bildung/bpb, Bonn.

Graham, M. 2011. Wiki Space: Palimpsests and the Politics of Exclusion. In Critical Point of View: A Wikipedia Reader. Eds. Lovink, G. and Tkacz, N. Amsterdam: Institute of Network Cultures, 269-282.


Ford, H. (2011) The Missing Wikipedians. In Geert Lovink and Nathaniel Tkacz (eds), Critical Point of View: A Wikipedia Reader, Amsterdam: Institute of Network Cultures, 2011. ISBN: 978-90-78146-13-1.

Graham, M., M. Zook., and A. Boulton. 2013. Augmented Reality in the Urban Environment: contested content and the duplicity of code. Transactions of the Institute of British Geographers. 38(3), 464-479.

Graham, M and M. Zook. 2013. Augmented Realities and Uneven Geographies: Exploring the Geo-linguistic Contours of the Web. Environment and Planning A 45(1) 77-99.

Graham, M. 2013. The Virtual Dimension. In Global City Challenges: debating a concept, improving the practice. eds. M. Acuto and W. Steele. London: Palgrave. 117-139.

Mark Graham is a Senior Research Fellow at the OII. His research focuses on Internet and information geographies, and the overlaps between ICTs and economic development.

Mapping the uneven geographies of information worldwide

Map of Flickr activity worldwide
Images are an important form of knowledge that allow us to develop understandings about our world; the global geographic distribution of geotagged images on Flickr reveals the density of visual representations and locally depicted knowledge of all places on our planet. Map by M.Graham, M.Stephens, S.Hale.

Information is the raw material for much of the work that goes on in the contemporary global economy, and visibility and voice in this information ecosystem is a prerequisite for influence and control. As Hand and Sandywell (2002: 199) have argued, “digitalised knowledge and its electronic media are indeed synonymous with power.” As such, it is important to understand who produces and reproduces information, who has access to it, and who and where are represented by it.

Traditionally, information and knowledge about the world have been geographically constrained. The transmission of information required either the movement of people or the availability of some other medium of communication. However, up until the late 20th century, almost all mediums of information – books, newspapers, academic journals, patents and the like – were characterised by huge geographic inequalities. The global north produced, consumed and controlled much of the world’s codified knowledge, while the global south was largely left out.

Today, the movement of information is, in theory, rarely constrained by distance. Very few parts of the world remain disconnected from the grid, and over 2 billion people are now online (most of them in the Global South). Unsurprisingly, many believe we now have the potential to access what Wikipedia’s founder Jimmy Wales refers to as “the sum of all human knowledge”. Theoretically, parts of the world that have been left out of flows and representations of knowledge can be quite literally put back on the map.

However, “potential” has too often been confused with actual practice, and stark digital divisions of labour are still evident in all open platforms that rely on user-generated content. Google Map’s databases contain more indexed user-generated content about the Tokyo metropolitan region than the entire continent of Africa. On Wikipedia, there is more written about Germany than about South America and Africa combined. In other words, there are massive inequalities that cannot simply be explained by uneven Internet penetration. A range of other physical, social, political and economic barriers are reinforcing this digital divide, amplifying the informational power of the already powerful and visible.

That’s not to say that the Internet doesn’t have important implications for the developing world. People use it not just to connect with friends and family, but to learn, share information, trade, and represent their communities. However, it’s important to be aware of the Internet’s highly uneven geographies of information. These inequalities matter to the south, because connectivity – despite being a clear prerequisite for access to most 21st-century platforms of knowledge sharing – by no means guarantees knowledge production and digital participation.

How do we move towards encouraging participation from (and about) parts of the world that are currently left out of virtual representations? The first step is to allow people to see what is, and isn’t, represented; something we are planning with this project. After that, there’s also a clear need for plans like Kenya’s strategy to boost local digital content, or Wikimedia’s Arabic Catalyst project, which aims to encourage the creation of content in Arabic and provide information about the Middle East.

It remains to be seen how effective such strategies will be in changing the highly uneven digital division of labour. As we rely increasingly on user-generated platforms, there is a real possibility that we will see the widening of divides between “digital cores” and “peripheries”. It’s therefore crucial to keep asking where visibility, voice and power reside in our increasingly networked world.


Graham, M. and M. Zook. 2013. Augmented Realities and Uneven Geographies: Exploring the Geo-linguistic Contours of the Web. Environment and Planning A 45(1) 77-99.

Graham, M. 2013. The Virtual Dimension. In Global City Challenges: debating a concept, improving the practice. eds. M. Acuto and W. Steele. London: Palgrave.

Graham, M., M. Zook., and A. Boulton. 2012. Augmented Reality in the Urban Environment: contested content and the duplicity of code. Transactions of the Institute of British Geographers. DOI: 10.1111/j.1475-5661.2012.00539.x

Graham, M. 2013. The Knowledge Based Economy and Digital Divisions of Labour. In Companion to Development Studies, 3rd edition, eds V. Desai, and R. Potter. Hodder.

Hand, M. and B. Sandywell. 2002. E-topia as Cosmopolis or Citadel On the Democratizing and De-democratizing Logics of the Internet, or, Toward a Critique of the New Technological Fetishism. Theory, Culture & Society

Mark Graham‘s research focuses on Internet and information geographies, and the overlaps between ICTs and economic development. His work on the geographies of the Internet examines how people and places are ever more defined by, and made visible through, not only their traditional physical locations and properties, but also their virtual attributes and digital shadows.

Read Mark’s blog.