What explains the worldwide patterns in user-generated geographical content?

The geographies of codified knowledge have always been uneven, affording some people and places greater voice and visibility than others. While the rise of the geosocial Web seemed to promise a greater diversity of voices, opinions, and narratives about places, many regions remain largely absent from the websites and services that represent them to the rest of the world. These highly uneven geographies of codified information matter because they shape what is known and what can be known. As geographic content and geospatial information becomes increasingly integral to our everyday lives, places that are left off the ‘map of knowledge’ will be absent from our understanding of, and interaction with, the world.

We know that Wikipedia is important to the construction of geographical imaginations of place, and that it has immense power to augment our spatial understandings and interactions (Graham et al. 2013). In other words, the presences and absences in Wikipedia matter. If a person’s primary free source of information about the world is the Persian or Arabic or Hebrew Wikipedia, then the world will look fundamentally different from the world presented through the lens of the English Wikipedia. The capacity to represent oneself to outsiders is especially important in those parts of the world that are characterized by highly uneven power relationships: Brunn and Wilson (2013) and Graham and Zook (2013) have already demonstrated the power of geospatial content to reinforce power in a South African township and Jerusalem, respectively.

Until now, there has been no large-scale empirical analysis of the factors that explain information geographies at the global scale; this is something we have aimed to address in this research project on Mapping and measuring local knowledge production and representation in the Middle East and North Africa. Using regression models of geolocated Wikipedia data we have identified what are likely to be the necessary conditions for representation at the country level, and have also identified the outliers, i.e. those countries that fare considerably better or worse than expected. We found that a large part of the variation could be explained by just three factors: namely, (1) country population, (2) availability of broadband Internet, and (3) the number of edits originating in that country. [See the full paper for an explanation of the data and the regression models.]

But how do we explain the significant inequalities in the geography of user-generated information that remain after adjusting for differing conditions using our regression model? While these three variables help to explain the sparse amount of content written about much of Sub-Saharan Africa, most of the Middle East and North Africa have quantities of geographic information below their expected values. For example, despite high levels of wealth and connectivity, Qatar and the United Arab Emirates have far fewer articles than we might expect from the model.

These three factors independently matter, but they will also be subject to a number of constraints. A country’s population will probably affect the number of human sites, activities, and practices of interest; ie the number of things one might want to write about. The size of the potential audience might also be influential, encouraging editors in denser-populated regions and those writing in major languages. However, societal attitudes towards learning and information sharing will probably also affect the propensity of people in some places to contribute content. Factors discouraging the number of edits to local content might include a lack of local Wikimedia chapters, the attractiveness of writing content about other (better-represented) places, or contentious disputes in local editing communities that divert time into edit wars and away from content generation.

We might also be seeing a principle of increasing informational poverty. Not only is a broader base of traditional source material (such as books, maps, and images) needed for the generation of any Wikipedia article, but it is likely that the very presence of content itself is a generative factor behind the production of further content. This makes information produced about information-sparse regions most useful for people in informational cores — who are used to integrating digital information into their everyday practices — rather than those in informational peripheries.

Various practices and procedures of Wikipedia editing likely amplify this effect. There are strict guidelines on how knowledge can be created and represented in Wikipedia, including a ban on original research, and the need to source key assertions. Editing incentives and constraints probably also encourage work around existing content (which is relatively straightforward to edit) rather than creation of entirely new material. In other words, the very policies and norms that govern the encyclopedia’s structure make it difficult to populate the white space with new geographic content. In addressing these patterns of increasing informational poverty, we need to recognize that no one of these three conditions can ever be sufficient for the generation of geographic knowledge. As well as highlighting the presences and absences in user-generated content, we also need to ask what factors encourage or limit production of that content.

In interpreting our model, we have come to a stark conclusion: increasing representation doesn’t occur in a linear fashion, but it accelerates in a virtuous cycle, benefitting those with strong editing cultures in local languages. For example, Britain, Sweden, Japan and Germany are extensively georeferenced on Wikipedia, whereas much of the MENA region has not kept pace, even accounting for their levels of connectivity, population, and editors. Thus, while some countries are experiencing the virtuous cycle of more edits and broadband begetting more georeferenced content, those on the periphery of these information geographies might fail to reach a critical mass of editors, or even dismiss Wikipedia as a legitimate site for user-generated geographic content: a problem that will need to be addressed if Wikipedia is indeed to be considered as the “sum of all human knowledge”.

Read the full paper: Graham, M., Hogan, B., Straumann, R.K., and Medhat, A. (2014) Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers.


Brunn S. D., and M. W. Wilson. 2013. Cape Town’s million plus black township of Khayelitsha: Terrae incognitae and the geographies and cartographies of silence, Habitat International. 39 284-294.

Graham M., and M. Zook. (2013) Augmented Realities and Uneven Geographies: Exploring the Geolinguistic Contours of the Web. Environment and Planning A 45(1): 77–99.

Graham M, M. Zook, and A. Boulton. 2013. Augmented Reality in the Urban Environment: Contested Content and the Duplicity of Code. Transactions of the Institute of British Geographers. 38(3) 464-479.

Mark Graham is a Senior Research Fellow at the OII. His research focuses on Internet and information geographies, and the overlaps between ICTs and economic development.

What is stopping greater representation of the MENA region?

Negotiating the wider politics of Wikipedia can be a daunting task, particularly when in it comes to content about the MENA region. Image of the Dome of the Rock (Qubbat As-Sakhrah), Jerusalem, by 1yen

Wikipedia has famously been described as a project that “ works great in practice and terrible in theory”. One of the ways in which it succeeds is through its extensive consensus-based governance structure. While this has led to spectacular success –over 4.5 million articles in the English Wikipedia alone — the governance structure is neither obvious nor immediately accessible, and can present a barrier for those seeking entry. Editing Wikipedia can be a tough challenge – an often draining and frustrating task, involving heated disputes and arguments where it is often the most tenacious, belligerent, or connected editor who wins out in the end.

Broadband access and literacy are not the only pre-conditions for editing Wikipedia; ‘digital literacy’ is also crucial. This includes the ability to obtain and critically evaluate online sources, locate Wikipedia’s editorial and governance policies, master Wiki syntax, and confidently articulate and assert one’s views about an article or topic. Experienced editors know how to negotiate the rules, build a consensus with some editors to block others, and how to influence administrators during dispute resolution. This strict adherence to the word (if not the spirit) of Wikipedia’s ‘law’ can lead to marginalization or exclusion of particular content, particularly when editors are scared off by unruly mobs who ‘weaponize’ policies to fit a specific agenda.

Governing such a vast collaborative platform as Wikipedia obviously presents a difficult balancing act between being open enough to attract volume of contributions, and moderated enough to ensure their quality. Many editors consider Wikipedia’s governance structure (which varies significantly between the different language versions) essential to ensuring the quality of its content, even if it means that certain editors can (for example) arbitrarily ban other users, lock down certain articles, and exclude moderate points of view. One of the editors we spoke to noted that: “A number of articles I have edited with quality sources, have been subjected to editors cutting information that doesn’t fit their ideas […] I spend a lot of time going back to reinstate information. Today’s examples are in the ‘Battle of Nablus (1918)’ and the ‘Third Transjordan attack’ articles. Bullying does occur from time to time […] Having tried the disputes process I wouldn’t recommend it.” Community building might help support MENA editors faced with discouragement or direct opposition as they try to build content about the region, but easily locatable translations of governance materials would also help. Few of the extensive Wikipedia policy discussions have been translated into Arabic, leading to replication of discussions or ambiguity surrounding correct dispute resolution.

Beyond arguments with fractious editors over minutiae (something that comes with the platform), negotiating the wider politics of Wikipedia can be a daunting task, particularly when in it comes to content about the MENA region. It would be an understatement to say that the Middle East is a politically sensitive region, with more than its fair share of apparently unresolvable disputes, competing ideologies (it’s the birthplace of three world religions…), repressive governments, and ongoing and bloody conflicts. Editors shared stories with us about meddling from state actors (eg Tunisia, Iran) and a lack of trust with a platform that is generally considered to be a foreign, and sometimes explicitly American, tool. Rumors abound that several states (eg Israel, Iran) have concerted efforts to work on Wikipedia content, creating a chilling effect for new editors who might feel that editing certain pages might prove dangerous, or simply frustrating or impossible. Some editors spoke of being asked by Syrian government officials for advice on how to remove critical content, or how to identify the editors responsible for putting it there. Again: the effect is chilling.

A lack of locally produced and edited content about the region clearly can’t be blamed entirely on ‘outsiders’. Many editors in the Arabic Wikipedia have felt snubbed by the creation of an explicitly “Egyptian Arabic” Wikipedia, which has not only forked the content and editorial effort, but also stymied any ‘pan-Arab’ identity on the platform. There is a culture of administrators deleting articles they do not think are locally appropriate; often relating to politically (or culturally) sensitive topics. Due to Arabic Wikipedia’s often vicious edit wars, it is heavily moderated (unlike for example the English version), and anonymous edits do not appear instantly.

Some editors at the workshops noted other systemic and cultural issues, for example complaining of an education system that encourages rote learning, reinforcing the notion that only experts should edit (or moderate) a topic, rather than amateurs with local familiarity. Editors also noted the notable gender disparities on the site; a longstanding issue for other Wikipedia versions as well. None of these discouragements are helped by what some editors noted as a larger ‘image problem’ with editing in the Arabic Wikipedia, given it would always be overshadowed by the dominant English Wikipedia, one editor commenting that: “the English Wikipedia is vastly larger than its Arabic counterpart, so it is not unthinkable that there is more content, even about Arab-world subjects, in English. From my (unscientific) observation, many times, content in Arabic about a place or a tribe is not very encyclopedic, but promotional, and lacks citations”. Translating articles into Arabic might be seen as menial and unrewarding work, when the exciting debates about an article are happening elsewhere.

When we consider the coming-together of all of these barriers, it might be surprising that Wikipedia is actually as large as it is. However, the editors we spoke with were generally optimistic about the site, considering it an important activity that serves the greater good. Wikipedia is without doubt one of the most significant cultural and political forces on the Internet. Wikipedians are remarkably generous with their time, and it’s their efforts that are helping to document, record, and represent much of the world – including places where documentation is scarce. Most of the editors at our workshop ultimately considered Wikipedia a path to a more just society; through not just consensus, voting, and an aspiration to record certain truths — seeing it not just as a site of conflict, but also a site of regional (and local) pride. When asked why he writes geographic content, one editor simply replied: “It’s my own town”.

Mark Graham is a Senior Research Fellow at the OII. His research focuses on Internet and information geographies, and the overlaps between ICTs and economic development.

How well represented is the MENA region in Wikipedia?

There are more Wikipedia articles in English than Arabic about almost every Arabic speaking country in the Middle East. Image of rock paintings in the Tadrart Acacus region of Libya by Luca Galuzzi.
There are more Wikipedia articles in English than Arabic about almost every Arabic speaking country in the Middle East. Image of rock paintings in the Tadrart Acacus region of Libya by Luca Galuzzi.
Wikipedia is often seen to be both an enabler and an equalizer. Every day hundreds of thousands of people collaborate on an (encyclopaedic) range of topics; writing, editing and discussing articles, and uploading images and video content. This structural openness combined with Wikipedia’s tremendous visibility has led some commentators to highlight it as “a technology to equalize the opportunity that people have to access and participate in the construction of knowledge and culture, regardless of their geographic placing” (Lessig 2003). However, despite Wikipedia’s openness, there are also fears that the platform is simply reproducing worldviews and knowledge created in the Global North at the expense of Southern viewpoints (Graham 2011; Ford 2011). Indeed, there are indications that global coverage in the encyclopaedia is far from ‘equal’, with some parts of the world heavily represented on the platform, and others largely left out (Hecht and Gergle 2009; Graham 2011, 2013, 2014).

These second-generation digital divides are not merely divides of Internet access (so discussed in the late 1990s), but gaps in representation and participation (Hargittai and Walejko 2008). Whereas most Wikipedia articles written about most European and East Asian countries are written in their dominant languages, for much of the Global South we see a dominance of articles written in English. These geographic differences in the coverage of different language versions of Wikipedia matter, because fundamentally different narratives can be (and are) created about places and topics in different languages (Graham and Zook 2013; Graham 2014).

If we undertake a ‘global analysis’ of this pattern by examining the number of geocoded articles (ie about a specific place) across Wikipedia’s main language versions (Figure 1), the first thing we can observe is the incredible human effort that has gone into describing ‘place’ in Wikipedia. The second is the clear and highly uneven geography of information, with Europe and North America home to 84% of all geolocated articles. Almost all of Africa is poorly represented in the encyclopaedia — remarkably, there are more Wikipedia articles written about Antarctica (14,959) than any country in Africa, and more geotagged articles relating to Japan (94,022) than the entire MENA region (88,342). In Figure 2 it is even more obvious that Europe and North America lead in terms of representation on Wikipedia.

Figure 1. Total number of geotagged Wikipedia articles across all 44 surveyed languages.
Figure 1. Total number of geotagged Wikipedia articles across all 44 surveyed languages.
Figure 2. Number of regional geotagged articles and population.
Figure 2. Number of regional geotagged articles and population.

Knowing how many articles describe a place only tells a part of the ‘representation story’. Figure 3 adds the linguistic element, showing the dominant language of Wikipedia articles per country. The broad pattern is that some countries largely define themselves in their own languages, and others appear to be largely defined from outside. For instance, almost all European countries have more articles about themselves in their dominant language; that is, most articles about the Czech Republic are written in Czech. Most articles about Germany are written in German (not English).

Figure 3. Language with the most geocoded articles by country (across 44 top languages on Wikipedia).
Figure 3. Language with the most geocoded articles by country (across 44 top languages on Wikipedia).

We do not see this pattern across much of the South, where English dominates across much of Africa, the Middle East, South and East Asia, and even parts of South and Central America. French dominates in five African countries, and German is dominant in one former German colony (Namibia) and a few other countries (e.g. Uruguay, Bolivia, East Timor).

The scale of these differences is striking. Not only are there more Wikipedia articles in English than Arabic about almost every Arabic speaking country in the Middle East, but there are more English articles about North Korea than there are Arabic articles about Saudi Arabia, Libya, and the UAE. Not only do we see most of the world’s content written about global cores, but it is largely dominated by a relatively few languages.

Figure 4 shows the total number of geotagged Wikipedia articles in English per country. The sheer density of this layer of information over some parts of the world is astounding (with 928,542 articles about places in English), nonetheless, in this layer of geotagged English content, only 3.23% of the articles are about Africa, and 1.67% are about the MENA region.

Figure 4. Number of geotagged articles in the English Wikipedia by country.
Figure 4. Number of geotagged articles in the English Wikipedia by country.

We see a somewhat different pattern when looking at the global geography of the 22,548 geotagged articles of the Arabic Wikipedia (Figure 5). Algeria and Syria are both defined by a relatively high number of articles in Arabic (as are the US, Italy, Spain, Russia and Greece). These information densities are substantially greater than what we see for many other MENA countries in which Arabic is an official language (such as Egypt, Morocco, and Saudi Arabia). This is even more surprising when we realise that the Italian and Spanish populations are smaller than the Egyptian, but there are nonetheless far more geotagged articles in Arabic about Italy (2,428) and Spain (1,988) than about Egypt (433).

Figure 5. Total number of geotagged articles in the Arabic Wikipedia by country.
Figure 5. Total number of geotagged articles in the Arabic Wikipedia by country.

By mapping the geography of Wikipedia articles in both global and regional languages, we can begin to examine the layers of representation that ‘augment’ the world we live in. We have seen that, notable exceptions aside (e.g. ‘Iran’ in Farsi and ‘Israel’ in Hebrew) the MENA region tends to be massively underrepresented — not just in major world languages, but also in its own: Arabic. Clearly, much is being left unsaid about that part of the world. Although we entered the project anticipating that the MENA region would be under-represented in English, we did not anticipate the degree to which it is under-represented in Arabic.


Ford, H. (2011) The Missing Wikipedians. In Critical Point of View: A Wikipedia Reader, ed. G. Lovink and N. Tkacz, 258-268. Amsterdam: Institute of Network Cultures.

Graham, M. (2014) The Knowledge Based Economy and Digital Divisions of Labour. In Companion to Development Studies, 3rd edition, eds v. Desai, and R. Potter. Hodder, pp. 189-195.

Graham, M. (2013) The Virtual Dimension. In Global City Challenges: Debating a Concept, Improving the Practice. Eds. Acuto, M. and Steele, W. London: Palgrave.

Graham, M. (2011) Wiki Space: Palimpsests and the Politics of Exclusion. In Critical Point of View: A Wikipedia Reader. Eds. Lovink, G. and Tkacz, N. Amsterdam: Institute of Network Cultures, pp. 269-282.

Graham M., and M. Zook (2013) Augmented Realities and Uneven Geographies: Exploring the Geolinguistic Contours of the Web. Environment and Planning A 45 (1) 77–99.

Hargittai, E. and G. Walejko (2008) The Participation Divide: Content Creation and Sharing in the Digital Age. Information, Communication and Society 11 (2) 239–256.

Hecht B., and D. Gergle (2009) Measuring self-focus bias in community-maintained knowledge repositories. In Proceedings of the 4th International Conference on Communities and Technologies, Penn State University, 2009, pp. 11–20. New York: ACM.

Lessig, L. (2003) An Information Society: Free or Feudal. Talk given at the World Summit on the Information Society, Geneva, 2003.

Mark Graham is a Senior Research Fellow at the OII. His research focuses on Internet and information geographies, and the overlaps between ICTs and economic development.

The sum of (some) human knowledge: Wikipedia and representation in the Arab World

Arabic is one of the least represented major world languages on Wikipedia: few languages have more speakers and fewer articles than Arabic. Image of the Umayyad Mosque (Damascus) by Travel Aficionado

Wikipedia currently contains over 9 million articles in 272 languages, far surpassing any other publicly available information repository. Being the first point of contact for most general topics (therefore an effective site for framing any subsequent representations) it is an important platform from which we can learn whether the Internet facilitates increased open participation across cultures — or reinforces existing global hierarchies and power dynamics. Because the underlying political, geographic and social structures of Wikipedia are hidden from users, and because there have not been any large scale studies of the geography of these structures and their relationship to online participation, entire groups of people (and regions) may be marginalized without their knowledge.

This process is important to understand, for the simple reason that Wikipedia content has begun to form a central part of services offered elsewhere on the Internet. When you look for information about a place on Facebook, the description of that place (including its geographic coordinates) comes from Wikipedia. If you want to “check in” to a museum in Doha to signify you were there to their friends, the place you check in to was created with Wikipedia data. When you Google “House of Saud” you are presented not only with a list of links (with Wikipedia at the top) but also with a special ‘card’ summarising the House. This data comes from Wikipedia. When you look for people or places, Google now has these terms inside its ‘knowledge graph’, a network of related concepts with data coming directly from Wikipedia. Similarly, on Google maps, Wikipedia descriptions for landmarks are presented as part of the default information.

Ironically, Wikipedia editorship is actually on a slow and steady decline, even as its content and readership increases year on year. Since 2007 and the introduction of significant devolution of administrative powers to volunteers, Wikipedia has not been able to effectively retain newcomers, something which has been noted as a concern by many at the Wikimedia Foundation. Some think Wikipedia might be levelling off because there’s only so much to write about. This is extremely far from the truth; there are still substantial gaps in geographic content in English and overwhelming gaps in other languages. Wikipedia often brands itself as aspiring to contain “the sum of human knowledge”, but behind this mantra lie policy pitfalls, tedious editor debates and delicate sourcing issues that hamper greater representation of the region. Of course these challenges form part of Wikipedia’s continuing evolution as the de facto source for online reference information, but they also (disturbingly) act to entrench particular ways of “knowing” — and ways of validating what is known.

There are over 260,000 articles in Arabic, receiving 240,000 views per hour. This actually translates as one of the least represented major world languages on Wikipedia: few languages have more speakers and fewer articles than Arabic. This relative lack of MENA voice and representation means that the tone and content of this globally useful resource, in many cases, is being determined by outsiders with a potential misunderstanding of the significance of local events, sites of interest and historical figures. In an area that has seen substantial social conflict and political upheaval, greater participation from local actors would help to ensure balance in content about contentious issues. Unfortunately, most research on MENA’s Internet presence has so far been drawn from anecdotal evidence, and no comprehensive studies currently exist.

In this project we wanted to understand where place-based content comes from, to explain reasons for the relative lack of Wikipedia articles in Arabic and about the MENA region, and to understand which parts of the region are particularly underrepresented. We also wanted to understand the relationship between Wikipedia’s administrative structure and the treatment of new editors; in particular, we wanted to know whether editors from the MENA region have less of a voice than their counterparts from elsewhere, and whether the content they create is considered more or less legitimate, as measured through the number of reverts; ie the overriding of their work by other editors.

Our practical objectives involved a consolidation of Middle Eastern Wikipedians though a number of workshops focusing on how to create more equitable and representative content, with the ultimate goal of making Wikipedia a more generative and productive site for reference information about the region. Capacity building among key Wikipedians can create greater understanding of barriers to participation and representation and offset much of the (often considerable) emotional labour required to sustain activity on the site in the face of intense arguments and ideological biases. Potential systematic structures of exclusion that could be a barrier to participation include such competitive practices as content deletion, indifference to content produced by MENA authors, and marginalization through bullying and dismissal.

However, a distinct lack of sources — owing both to a lack of legitimacy for MENA journalism and a paucity of open access government documents — is also inhibiting further growth of content about the region. When inclusion of a topic is contested by editors it is typically because there is not enough external source material about it to establish “notability”. As Ford (2011) has already discussed, notability is often culturally mediated. For example, a story in Al Jazeera would not have been considered a sufficient criterion of notability a couple of years ago. However, this has changed dramatically since its central role in reporting on the Arab Spring.

Unfortunately, notability can create a feedback loop. If an area of the world is underreported, there are no sources. If there are no sources, then journalists do not always have enough information to report about that part of the world. ‘Correct’ sourcing trumps personal experience on Wikipedia; even if an author is from a place, and is watching a building being destroyed, their Wikipedia edit will not be accepted by the community unless the event is discussed in another ‘official’ medium. Often the edit will either be branded with a ‘citation needed’ tag, eliminated, or discussed in the talk page. Particularly aggressive editors and administrators will nominate the page for ‘speedy deletion’ (ie deletion without discussion), a practice that makes responses from an author difficult

Why does any of this matter in practical terms? For the simple reason that biases, absences and contestations on Wikipedia spill over into numerous other domains that are in regular and everyday use (Graham and Zook, 2013). If a place is not on Wikipedia, this might have a chilling effect on business and stifle journalism; if a place is represented poorly on Wikipedia this can lead to misunderstandings about the place. Wikipedia is not a legislative body. However, in the court of public opinion, Wikipedia represents one of the world’s strongest forces, as it quietly inserts itself into representations of place worldwide (Graham et. al 2013; Graham 2013).

Wikipedia is not merely a site of reference information, but is rapidly becoming the de facto site for representing the world to itself. We need to understand more about that representation.

Further Reading

Allagui, I., Graham, M., and Hogan, B. 2014. Wikipedia Arabe et la Construction Collective du Savoir In Wikipedia, objet scientifique non identifie. eds. Barbe, L., and Merzeau, L. Paris: Presses Universitaries du Paris Ouest (in press).

Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).

Graham, M. 2012. Die Welt in Der Wikipedia Als Politik der Exklusion: Palimpseste des Ortes und selective Darstellung. In Wikipedia. eds. S. Lampe, and P. Bäumer. Bundeszentrale für politische Bildung/bpb, Bonn.

Graham, M. 2011. Wiki Space: Palimpsests and the Politics of Exclusion. In Critical Point of View: A Wikipedia Reader. Eds. Lovink, G. and Tkacz, N. Amsterdam: Institute of Network Cultures, 269-282.


Ford, H. (2011) The Missing Wikipedians. In Geert Lovink and Nathaniel Tkacz (eds), Critical Point of View: A Wikipedia Reader, Amsterdam: Institute of Network Cultures, 2011. ISBN: 978-90-78146-13-1.

Graham, M., M. Zook., and A. Boulton. 2013. Augmented Reality in the Urban Environment: contested content and the duplicity of code. Transactions of the Institute of British Geographers. 38(3), 464-479.

Graham, M and M. Zook. 2013. Augmented Realities and Uneven Geographies: Exploring the Geo-linguistic Contours of the Web. Environment and Planning A 45(1) 77-99.

Graham, M. 2013. The Virtual Dimension. In Global City Challenges: debating a concept, improving the practice. eds. M. Acuto and W. Steele. London: Palgrave. 117-139.

Mark Graham is a Senior Research Fellow at the OII. His research focuses on Internet and information geographies, and the overlaps between ICTs and economic development.

Mapping the Local Geographies of Digital Inequality in Britain

Britain has one of the largest Internet economies in the industrial world. The Internet contributes an estimated 8.3% to Britain’s GDP (Dean et al. 2012), and strongly supports domestic job and income growth by enabling access to new customers, markets and ideas. People benefit from better communications, and businesses are more likely to locate in areas with good digital access, thereby boosting local economies (Malecki & Moriset 2008). While the Internet brings clear benefits, there is also a marked inequality in its uptake and use (the so-called ‘digital divide’). We already know from the Oxford Internet Surveys (OxIS) that Internet use in Britain is strongly stratified by age, by income and by education; and yet we know almost nothing about local patterns of Internet use across the country.

A problem with national sample surveys (the usual source of data about Internet use and non-use), is that the sample sizes become too small to allow accurate generalization at smaller, sub-national areas. No one knows, for example, the proportion of Internet users in Glasgow, because national surveys simply won’t have enough respondents to make reliable city-level estimates. We know that Internet use is not evenly distributed at the regional level; Ofcom reports on broadband speeds and penetration at the county level (Ofcom 2011), and we know that London and the southeast are the most wired part of the country (Dean et al. 2012). But given the importance of the Internet, the lack of knowledge about local patterns of access and use in Britain is surprising. This is a problem because without detailed information about small areas we can’t identify where would benefit most from policy intervention to encourage Internet use and improve access.

We have begun to address this lack of information by combining two important but separate datasets — the 2011 national census, and the 2013 OxIS surveys — using the technique of small area estimation. By definition, census data are available for very small areas, and because it reaches (basically) everyone, there will be no sampling issues. Unfortunately, it is extremely expensive to collect this data, so it doesn’t collect many variables (it has no data on Internet use, for example). The second dataset, the OII’s Oxford Internet Survey (OxIS), is a very rich dataset of all kinds of Internet activity, measured with a random sample of more than 2,000 individuals across Britain. Because OxIS is unable to survey everyone in Britain, it is based on a random sample of people living in geographical ‘Output Areas’ (OAs). These areas (generally of 40-250 households) represent the fundamental building block of the national census, being the smallest geographical area for which it reports data.

Because OxIS and the census (happily) use the same OAs, we can combine national-level data on Internet use (from OxIS) with local-level demographic information (from the census) to map estimated Internet use across Britain for the first time. We can do this because we can estimate from OxIS the likelihood of an individual using the Internet just from basic demographic data (age, income, education etc.). And because the census records these demographics for everyone in each OA, we can go on to estimate the likely proportion of Internet users in each of these areas. By combining the richness of OxIS survey data with the comprehensive small area coverage of the census we can use the strengths of one to offset the gaps in the other.

Of course, this procedure assumes that people in small areas will generally match national patterns of Internet use; ie that those who are better educated, employed, and young, are more likely to use the Internet. We assume that this pattern isn’t affected by cultural or social factors (e.g. ‘Northerners just like the Internet more’), or by anything unusual about a particular group of households that makes it buck national trends (eg ‘the young people of Wytham Street, Oxford just prefer not to use the Internet’).

So what do we see when we combine the two datasets? What are the local-level patterns of Internet use across Britain? We can see from the figure that the highest estimated Internet use (88-89%) is concentrated in the south east, with London dominating. Bristol, Southampton, and Nottingham also have high levels of use, as well as the rest of the south (interestingly, including rural Cornwall) with estimated usage levels of 78-83%. Leeds, York and Manchester are also in this category. In the lowest category (59-70% use) we find the entire North East region. Cities show much the same pattern, with southern cities having the highest estimated Internet use, and Newcastle and Middlesbrough having the lowest.

There isn’t room in this post to explore and discuss all the patterns (or to speculate on the underlying reasons), but there are clear policy implications from this work. The Internet has made an enormous difference in our social life, culture, and economy; this is why it is important to bring people online, to encourage them all to participate and benefit. However, despite the importance of the Internet in Britain today, we still know very little about who is, and isn’t connected. We hope this approach (and this data) can help pinpoint the areas of greatest need. For example, the North East is striking — even the cities don’t seem to stand out from the surrounding rural areas. Allocating resources to improve use in the North East would probably be valuable, with rural areas as a secondary priority. Interestingly, Cornwall (despite being very rural) is actually above average in terms of likely Internet users, and is also the recipient of a major European Regional Development Fund effort to extend their broadband.

Actually getting access via fibre-optic cable is just one part of the story of Internet use (and one we don’t cover in this post); but this is the first time we have been estimate the likely use at a local level, based on the known characteristics of the people who live there. Using these small area estimation techniques opens a whole new area for social media research and policy-making around local patterns of digital participation. Going forward, we intend to expand the model to include urban-rural differences, the index of multiple deprivation, occupation, and socio-economic status. But there’s already much more we can do with these data.


Dean, D., DiGrande, S., Field, D., Lundmark, A., O’Day, J., Pineda, J., Zwillenberg, P. (2012) The connected world: The Internet economy in the G-20. Boston: Boston Consulting Group.

Malecki, E.J. & Moriset, B. (2008) The digital economy: Business organization, production processes and regional developments. London: Routledge.

Ofcom (2011) Communications infrastructure report: Fixed broadband data. [accessed on 23/9/2013 from http://stakeholders.ofcom.org.uk/binaries/research/broadband-research/Fixed_Broadband_June_2011.pdf ]

Read the full paper: Blank, G., Graham, M., and Calvino, C. (2014) Mapping the Local Geographies of Digital Inequality. [contact the authors for the paper and citation details]

Grant Blank is a Survey Research Fellow at the OII. He is a sociologist who studies the social and cultural impact of the Internet and other new communication media. He is principal investigator on the OII’s Geography of Digital Inequality project, which combines OxIS and census data to produce the first detailed geographic estimates of Internet use across the UK.

Edit wars! Measuring and mapping society’s most controversial topics

Ed: How did you construct your quantitative measure of ‘conflict’? Did you go beyond just looking at content flagged by editors as controversial?

Taha: Yes we did … actually, we have shown that controversy measures based on “controversial” flags are not inclusive at all and although they might have high precision, they have very low recall. Instead, we constructed an automated algorithm to locate and quantify the editorial wars taking place on the Wikipedia platform. Our algorithm is based on reversions, i.e. when editors undo each other’s contributions. We focused specifically on mutual reverts between pairs of editors and we assigned a maturity score to each editor, based on the total volume of their previous contributions. While counting the mutual reverts, we used more weight for those ones committed by/on editors with higher maturity scores; as a revert between two experienced editors indicates a more serious problem. We always validated our method and compared it with other methods, using human judgement on a random selection of articles.

Ed: Was there any discrepancy between the content deemed controversial by your own quantitative measure, and what the editors themselves had flagged?

Taha: We were able to capture all the flagged content, but not all the articles found to be controversial by our method are flagged. And when you check the editorial history of those articles, you soon realise that they are indeed controversial but for some reason have not been flagged. It’s worth mentioning that the flagging process is not very well implemented in smaller language editions of Wikipedia. Even if the controversy is detected and flagged in English Wikipedia, it might not be in the smaller language editions. Our model is of course independent of the size and editorial conventions of different language editions.

Ed: Were there any differences in the way conflicts arose / were resolved in the different language versions?

Taha: We found the main differences to be the topics of controversial articles. Although some topics are globally debated, like religion and politics, there are many topics which are controversial only in a single language edition. This reflects the local preferences and importances assigned to topics by different editorial communities. And then the way editorial wars initiate and more importantly fade to consensus is also different in different language editions. In some languages moderators interfere very soon, while in others the war might go on for a long time without any moderation.

Ed: In general, what were the most controversial topics in each language? And overall?

Taha: Generally, religion, politics, and geographical places like countries and cities (sometimes even villages) are the topics of debates. But each language edition has also its own focus, for example football in Spanish and Portuguese, animations and TV series in Chinese and Japanese, sex and gender-related topics in Czech, and Science and Technology related topics in French Wikipedia are very often behind editing wars.

Ed: What other quantitative studies of this sort of conflict -ie over knowledge and points of view- are there?

Taha: My favourite work is one by researchers from Barcelona Media Lab. In their paper Jointly They Edit: Examining the Impact of Community Identification on Political Interaction in Wikipedia they provide quantitative evidence that editors interested in political topics identify themselves more significantly as Wikipedians than as political activists, even though they try hard to reflect their opinions and political orientations in the articles they contribute to. And I think that’s the key issue here. While there are lots of debates and editorial wars between editors, at the end what really counts for most of them is Wikipedia as a whole project, and the concept of shared knowledge. It might explain how Wikipedia really works despite all the diversity among its editors.

Ed: How would you like to extend this work?

Taha: Of course some of the controversial topics change over time. While Jesus might stay a controversial figure for a long time, I’m sure the article on President (W) Bush will soon reach a consensus and most likely disappear from the list of the most controversial articles. In the current study we examined the aggregated data from the inception of each Wikipedia-edition up to March 2010. One possible extension that we are working on now is to study the dynamics of these controversy-lists and the positions of topics in them.

Read the full paper: Yasseri, T., Spoerri, A., Graham, M. and Kertész, J. (2014) The most controversial topics in Wikipedia: A multilingual and geographical analysis. In: P.Fichman and N.Hara (eds) Global Wikipedia: International and cross-cultural issues in online collaboration. Scarecrow Press.

Taha was talking to blog editor David Sutcliffe.

Taha Yasseri is the Big Data Research Officer at the OII. Prior to coming to the OII, he spent two years as a Postdoctoral Researcher at the Budapest University of Technology and Economics, working on the socio-physical aspects of the community of Wikipedia editors, focusing on conflict and editorial wars, along with Big Data analysis to understand human dynamics, language complexity, and popularity spread. He has interests in analysis of Big Data to understand human dynamics, government-society interactions, mass collaboration, and opinion dynamics.

Who represents the Arab world online?

Editors from all over the world have played some part in writing about Egypt; in fact, only 13% of all edits actually originate in the country (38% are from the US). More: Who edits Wikipedia? by Mark Graham.

Ed: In basic terms, what patterns of ‘information geography’ are you seeing in the region?

Mark: The first pattern that we see is that the Middle East and North Africa are relatively under-represented in Wikipedia. Even after accounting for factors like population, Internet access, and literacy, we still see less contact than would be expected. Second, of the content that exists, a lot of it is in European and French rather than in Arabic (or Farsi or Hebrew). In other words, there is even less in local languages.

And finally, if we look at contributions (or edits), not only do we also see a relatively small number of edits originating in the region, but many of those edits are being used to write about other parts of the word rather than their own region. What this broadly seems to suggest is that the participatory potentials of Wikipedia aren’t yet being harnessed in order to even out the differences between the world’s informational cores and peripheries.

Ed: How closely do these online patterns in representation correlate with regional (offline) patterns in income, education, language, access to technology (etc.) Can you map one to the other?

Mark: Population and broadband availability alone explain a lot of the variance that we see. Other factors like income and education also play a role, but it is population and broadband that have the greatest explanatory power here. Interestingly, it is most countries in the MENA region that fail to fit well to those predictors.

Ed: How much do you think these patterns result from the systematic imposition of a particular view point – such as official editorial policies – as opposed to the (emergent) outcome of lots of users and editors acting independently?

Mark: Particular modes of governance in Wikipedia likely do play a factor here. The Arabic Wikipedia, for instance, to combat vandalism has a feature whereby changes to articles need to be reviewed before being made public. This alone seems to put off some potential contributors. Guidelines around sourcing in places where there are few secondary sources also likely play a role.

Ed: How much discussion (in the region) is there around this issue? Is this even acknowledged as a fact or problem?

Mark: I think it certainly is recognised as an issue now. But there are few viable alternatives to Wikipedia. Our goal is hopefully to identify problems that lead to solutions, rather than simply discouraging people from even using the platform.

Ed: This work has been covered by the Guardian, Wired, the Huffington Post (etc.) How much interest has there been from the non-Western press or bloggers in the region?

Mark: There has been a lot of coverage from the non-Western press, particularly in Latin America and Asia. However, I haven’t actually seen that much coverage from the MENA region.

Ed: As an academic, do you feel at all personally invested in this, or do you see your role to be simply about the objective documentation and analysis of these patterns?

Mark: I don’t believe there is any such thing as ‘objective documentation.’ All research has particular effects in and on the world, and I think it is important to be aware of the debates, processes, and practices surrounding any research project. Personally, I think Wikipedia is one of humanity’s greatest achievements. No previous single platform or repository of knowledge has ever even come close to Wikipedia in terms of its scale or reach. However, that is all the more reason to critically investigate what exactly is, and isn’t, contained within this fantastic resource. By revealing some of the biases and imbalances in Wikipedia, I hope that we’re doing our bit to improving it.

Ed: What factors do you think would lead to greater representation in the region? For example: is this a matter of voices being actively (or indirectly) excluded, or are they maybe just not all that bothered?

Mark: This is certainly a complicated question. I think the most important step would be to encourage participation from the region, rather than just representation of the region. Some of this involves increasing some of the enabling factors that are the prerequisites for participation; factors like: increasing broadband access, increasing literacy, encouraging more participation from women and minority groups.

Some of it is then changing perceptions around Wikipedia. For instance, many people that we spoke to in the region framed Wikipedia as an American our outside project rather than something that is locally created. Unfortunately we seem to be currently stuck in a vicious cycle in which few people from the region participate, therefore fulfilling the very reason why some people think that they shouldn’t participate. There is also the issue of sources. Not only does Wikipedia require all assertions to be properly sourced, but secondary sources themselves can be a great source of raw informational material for Wikipedia articles. However, if few sources about a place exist, then it adds an additional burden to creating content about that place. Again, a vicious cycle of geographic representation.

My hope is that by both working on some of the necessary conditions to participation, and engaging in a diverse range of initiatives to encourage content generation, we can start to break out of some of these vicious cycles.

Ed: The final moonshot question: How would you like to extend this work; time and money being no object?

Mark: Ideally, I’d like us to better understand the geographies of representation and participation outside of just the MENA region. This would involve mixed-methods (large scale big data approaches combined with in-depth qualitative studies) work focusing on multiple parts of the world. More broadly, I’m trying to build a research program that maintains a focus on a wide range of Internet and information geographies. The goal here is to understand participation and representation through a diverse range of online and offline platforms and practices and to share that work through a range of publicly accessible media: for instance the ‘Atlas of the Internet’ that we’re putting together.

Mark Graham was talking to blog editor David Sutcliffe.

Mark Graham is a Senior Research Fellow at the OII. His research focuses on Internet and information geographies, and the overlaps between ICTs and economic development.

Mapping the uneven geographies of information worldwide

Map of Flickr activity worldwide
Images are an important form of knowledge that allow us to develop understandings about our world; the global geographic distribution of geotagged images on Flickr reveals the density of visual representations and locally depicted knowledge of all places on our planet. Map by M.Graham, M.Stephens, S.Hale.

Information is the raw material for much of the work that goes on in the contemporary global economy, and visibility and voice in this information ecosystem is a prerequisite for influence and control. As Hand and Sandywell (2002: 199) have argued, “digitalised knowledge and its electronic media are indeed synonymous with power.” As such, it is important to understand who produces and reproduces information, who has access to it, and who and where are represented by it.

Traditionally, information and knowledge about the world have been geographically constrained. The transmission of information required either the movement of people or the availability of some other medium of communication. However, up until the late 20th century, almost all mediums of information – books, newspapers, academic journals, patents and the like – were characterised by huge geographic inequalities. The global north produced, consumed and controlled much of the world’s codified knowledge, while the global south was largely left out.

Today, the movement of information is, in theory, rarely constrained by distance. Very few parts of the world remain disconnected from the grid, and over 2 billion people are now online (most of them in the Global South). Unsurprisingly, many believe we now have the potential to access what Wikipedia’s founder Jimmy Wales refers to as “the sum of all human knowledge”. Theoretically, parts of the world that have been left out of flows and representations of knowledge can be quite literally put back on the map.

However, “potential” has too often been confused with actual practice, and stark digital divisions of labour are still evident in all open platforms that rely on user-generated content. Google Map’s databases contain more indexed user-generated content about the Tokyo metropolitan region than the entire continent of Africa. On Wikipedia, there is more written about Germany than about South America and Africa combined. In other words, there are massive inequalities that cannot simply be explained by uneven Internet penetration. A range of other physical, social, political and economic barriers are reinforcing this digital divide, amplifying the informational power of the already powerful and visible.

That’s not to say that the Internet doesn’t have important implications for the developing world. People use it not just to connect with friends and family, but to learn, share information, trade, and represent their communities. However, it’s important to be aware of the Internet’s highly uneven geographies of information. These inequalities matter to the south, because connectivity – despite being a clear prerequisite for access to most 21st-century platforms of knowledge sharing – by no means guarantees knowledge production and digital participation.

How do we move towards encouraging participation from (and about) parts of the world that are currently left out of virtual representations? The first step is to allow people to see what is, and isn’t, represented; something we are planning with this project. After that, there’s also a clear need for plans like Kenya’s strategy to boost local digital content, or Wikimedia’s Arabic Catalyst project, which aims to encourage the creation of content in Arabic and provide information about the Middle East.

It remains to be seen how effective such strategies will be in changing the highly uneven digital division of labour. As we rely increasingly on user-generated platforms, there is a real possibility that we will see the widening of divides between “digital cores” and “peripheries”. It’s therefore crucial to keep asking where visibility, voice and power reside in our increasingly networked world.


Graham, M. and M. Zook. 2013. Augmented Realities and Uneven Geographies: Exploring the Geo-linguistic Contours of the Web. Environment and Planning A 45(1) 77-99.

Graham, M. 2013. The Virtual Dimension. In Global City Challenges: debating a concept, improving the practice. eds. M. Acuto and W. Steele. London: Palgrave.

Graham, M., M. Zook., and A. Boulton. 2012. Augmented Reality in the Urban Environment: contested content and the duplicity of code. Transactions of the Institute of British Geographers. DOI: 10.1111/j.1475-5661.2012.00539.x

Graham, M. 2013. The Knowledge Based Economy and Digital Divisions of Labour. In Companion to Development Studies, 3rd edition, eds V. Desai, and R. Potter. Hodder.

Hand, M. and B. Sandywell. 2002. E-topia as Cosmopolis or Citadel On the Democratizing and De-democratizing Logics of the Internet, or, Toward a Critique of the New Technological Fetishism. Theory, Culture & Society

Mark Graham‘s research focuses on Internet and information geographies, and the overlaps between ICTs and economic development. His work on the geographies of the Internet examines how people and places are ever more defined by, and made visible through, not only their traditional physical locations and properties, but also their virtual attributes and digital shadows.

Read Mark’s blog.

Crowdsourcing translation during crisis situations: are ‘real voices’ being excluded from the decisions and policies it supports?

As revolution spread across North Africa and the Middle East in 2011, participants and observers of the events were keen to engage via social media. However, saturation by Arab-language content demanded a new translation strategy for those outside the region to follow the information flows — and for those inside to reach beyond their domestic audience. Crowdsourcing was seen as the most efficient strategy in terms of cost and time to meet the demand, and translation applications that harnessed volunteers across the internet were integrated with nearly every type of ICT project. For example, as Steve Stottlemyre has already mentioned on this blog, translation played a part in tools like the Libya Crisis Map, and was essential for harnessing tweets from the region’s ‘voices on the ground.’

If you have ever worried about media bias then you should really worry about the impact of translation. Before the revolutions, the translation software for Egyptian Arabic was almost non-existent. Few translation applications were able to handle the different Arabic dialects or supply coding labor and capital to build something that could contend with internet blackouts. Google’s Speak to Tweet became the dominant application used in the Egyptian uprisings, delivering one homogenized source of information that fed the other sources. In 2011, this collaboration helped circumvent the problem of Internet connectivity in Egypt by allowing cellphone users to call their tweet into a voicemail to be transcribed and translated. A crowd of volunteers working for Twitter enhanced translation of Egyptian Arabic after the Tweets were first transcribed by a Mechanical Turk application trained from an initial 10 hours of speech.

Continue reading “Crowdsourcing translation during crisis situations: are ‘real voices’ being excluded from the decisions and policies it supports?”

Did Libyan crisis mapping create usable military intelligence?

The Middle East has recently witnessed a series of popular uprisings against autocratic rulers. In mid-January 2011, Tunisian President Zine El Abidine Ben Ali fled his country, and just four weeks later, protesters overthrew the regime of Egyptian President Hosni Mubarak. Yemen’s government was also overthrown in 2011, and Morocco, Jordan, and Oman saw significant governmental reforms leading, if only modestly, toward the implementation of additional civil liberties.

Protesters in Libya called for their own ‘day of rage’ on February 17, 2011, marked by violent protests in several major cities, including the capitol Tripoli. As they transformed from ‘protestors’ to ‘Opposition forces’ they began pushing information onto Twitter, Facebook, and YouTube, reporting their firsthand experiences of what had turned into a civil war virtually overnight. The evolving humanitarian crisis prompted the United Nations to request the creation of the Libya Crisis Map, which was made public on March 6, 2011. Other, more focused crisis maps followed, and were widely distributed on Twitter.

While the map was initially populated with humanitarian information pulled from the media and online social networks, as the imposition of an internationally enforced No Fly Zone (NFZ) over Libya became imminent, information began to appear on it that appeared to be of a tactical military nature. While many people continued to contribute conventional humanitarian information to the map, the sudden shift toward information that could aid international military intervention was unmistakable.

How useful was this information, though? Agencies in the U.S. Intelligence Community convert raw data into useable information (incorporated into finished intelligence) by utilizing some form of the Intelligence Process. As outlined in the U.S. military’s joint intelligence manual, this consists of six interrelated steps all centered on a specific mission. It is interesting that many Twitter users, though perhaps unaware of the intelligence process, replicated each step during the Libyan civil war; producing finished intelligence adequate for consumption by NATO commanders and rebel leadership.

It was clear from the beginning of the Libyan civil war that very few people knew exactly what was happening on the ground. Even NATO, according to one of the organization’s spokesmen, lacked the ground-level informants necessary to get a full picture of the situation in Libya. There is no public information about the extent to which military commanders used information from crisis maps during the Libyan civil war. According to one NATO official, “Any military campaign relies on something that we call ‘fused information’. So we will take information from every source we can… We’ll get information from open source on the internet, we’ll get Twitter, you name any source of media and our fusion centre will deliver all of that into useable intelligence.”

The data in these crisis maps came from a variety of sources, including journalists, official press releases, and civilians on the ground who updated blogs and/or maintaining telephone contact. The @feb17voices Twitter feed (translated into English and used to support the creation of The Guardian’s and the UN’s Libya Crisis Map) included accounts of live phone calls from people on the ground in areas where the Internet was blocked, and where there was little or no media coverage. Twitter users began compiling data and information; they tweeted and retweeted data they collected, information they filtered and processed, and their own requests for specific data and clarifications.

Information from various Twitter feeds was then published in detailed maps of major events that contained information pertinent to military and humanitarian operations. For example, as fighting intensified, @LibyaMap’s updates began to provide a general picture of the battlefield, including specific, sourced intelligence about the progress of fighting, humanitarian and supply needs, and the success of some NATO missions. Although it did not explicitly state its purpose as spreading mission-relevant intelligence, the nature of the information renders alternative motivations highly unlikely.

Interestingly, the Twitter users featured in a June 2011 article by the Guardian had already explicitly expressed their intention of affecting military outcomes in Libya by providing NATO forces with specific geographical coordinates to target Qadhafi regime forces. We could speculate at this point about the extent to which the Intelligence Community might have guided Twitter users to participate in the intelligence process; while NATO and the Libyan Opposition issued no explicit intelligence requirements to the public, they tweeted stories about social network users trying to help NATO, likely leading their online supporters to draw their own conclusions.

It appears from similar maps created during the ongoing uprisings in Syria that the creation of finished intelligence products by crisis mappers may become a regular occurrence. Future study should focus on determining the motivations of mappers for collecting, processing, and distributing intelligence, particularly as a better understanding of their motivations could inform research on the ethics of crisis mapping. It is reasonable to believe that some (or possibly many) crisis mappers would be averse to their efforts being used by military commanders to target “enemy” forces and infrastructure.

Indeed, some are already questioning the direction of crisis mapping in the absence of professional oversight (Global Brief 2011): “[If] crisis mappers do not develop a set of best practices and shared ethical standards, they will not only lose the trust of the populations that they seek to serve and the policymakers that they seek to influence, but (…) they could unwittingly increase the number of civilians being hurt, arrested or even killed without knowing that they are in fact doing so.”

Read the full paper: Stottlemyre, S., and Stottlemyre, S. (2012) Crisis Mapping Intelligence Information During the Libyan Civil War: An Exploratory Case Study. Policy and Internet 4 (3-4).