data science

As Africa goes digital, the challenge for policymakers becomes moving from digitisation to managing and curating digital data in ways that keep people’s identities and activities secure.

Africa is in the midst of a technological revolution, and the current wave of digitisation has the potential to make the continent’s citizens a rich mine of data. Intersection in Zomba, Malawi. Image by john.duffell.

After the last decade’s exponential rise in ICT use, Africa is fast becoming a source of big data. Africans are increasingly emitting digital information with their mobile phone calls, internet use and various forms of digitised transactions, while on a state level e-government starts to become a reality. As Africa goes digital, the challenge for policymakers becomes what the WRR, a Dutch policy organisation, has identified as ‘i-government’: moving from digitisation to managing and curating digital data in ways that keep people’s identities and activities secure. On one level, this is an important development for African policymakers, given that accurate information on their populations has been notoriously hard to come by and, where it exists, has not been shared. On another, however, it represents a tremendous challenge. The WRR has pointed out the unpreparedness of European governments, who have been digitising for decades, for the age of i-government. How are African policymakers, as relative newcomers to digital data, supposed to respond? There are two possible scenarios. One is that systems will develop for the release and curation of Africans’ data by corporations and governments, and that it will become possible, in the words of the UN’s Global Pulse initiative, to use it as a ‘public good’—an invaluable tool for development policies and crisis response. The other is that there will be a new scramble for Africa: a digital resource grab that may have implications as great as the original scramble amongst the colonial powers in the late 19th century. We know that African data is not only valuable to Africans. The current wave of digitisation has the potential to make the continent’s citizens a rich mine of data about health interventions, human mobility, conflict and violence, technology adoption, communication dynamics and financial behaviour, with the default mode being for this to happen without their consent or involvement, and without ethical and normative frameworks to ensure data protection or to weigh…

The new networks of political protest, which harness these new online technologies are often described in theoretical terms as being ‘fluid’ and ‘horizontal’, in contrast to the rigid and hierarchical structure of earlier protest organisation.

How have online technologies reconfigured collective action? It is often assumed that the rise of social networking tools, accompanied by the mass adoption of mobile devices, have strengthened the impact and broadened the reach of today’s political protests. Enabling massive self-communication allows protesters to write their own interpretation of events—free from a mass media often seen as adversarial—and emerging protests may also benefit from the cheaper, faster transmission of information and more effective mobilisation made possible by online tools such as Twitter. The new networks of political protest, which harness these new online technologies are often described in theoretical terms as being ‘fluid’ and ‘horizontal’, in contrast to the rigid and hierarchical structure of earlier protest organisation. Yet such theoretical assumptions have seldom been tested empirically. This new language of networks may be useful as a shorthand to describe protest dynamics, but does it accurately reflect how protest networks mediate communication and coordinate support? The global protests against austerity and inequality which took place on May 12, 2012 provide an interesting case study to test the structure and strength of a transnational online protest movement. The ‘indignados’ movement emerged as a response to the Spanish government’s politics of austerity in the aftermath of the global financial crisis. The movement flared in May 2011, when hundreds of thousands of protesters marched in Spanish cities, and many set up camps ahead of municipal elections a week later. These protests contributed to the emergence of the worldwide Occupy movement. After the original plan to occupy New York City’s financial district mobilised thousands of protesters in September 2011, the movement spread to other cities in the US and worldwide, including London and Frankfurt, before winding down as the camp sites were dismantled weeks later. Interest in these movements was revived, however, as the first anniversary of the ‘indignados’ protests approached in May 2012. To test whether the fluidity, horizontality and connectivity often claimed for…

Government agencies are rarely completely transparent, often do not provide clear instructions for accessing the information they store, seldom use standardised norms, and can overlook user needs.

A view inside the House chamber of the Utah State Legislature. Image by deltaMike.

Public demands for transparency in the political process have long been a central feature of American democracy, and recent technological improvements have considerably facilitated the ability of state governments to respond to such public pressures. With online legislative archives, state legislatures can make available a large number of public documents. In addition to meeting the demands of interest groups, activists, and the public at large, these websites enable researchers to conduct single-state studies, cross-state comparisons, and longitudinal analysis. While online legislative archives are, in theory, rich sources of information that save researchers valuable time as they gather data across the states, in practice, government agencies are rarely completely transparent, often do not provide clear instructions for accessing the information they store, seldom use standardised norms, and can overlook user needs. These obstacles to state politics research are longstanding: Malcolm Jewell noted almost three decades ago the need for “a much more comprehensive and systematic collection and analysis of comparative state political data.” While the growing availability of online legislative resources helps to address the first problem of collection, the limitations of search and retrieval functions remind us that the latter remains a challenge. The fifty state legislative websites are quite different; few of them are intuitive or adequately transparent, and there is no standardised or systematic process to retrieve data. For many states, it is not possible to identify issue-specific bills that are introduced and/or passed during a specific period of time, let alone the sponsors or committees, without reading the full text of each bill. For researchers who are interested in certain time periods, policy areas, committees, or sponsors, the inability to set filters or immediately see relevant results limits their ability to efficiently collect data. Frustrated by the obstacles we faced in undertaking a study of state-level immigration legislation before and after September 11, 2001, we decided to instead  evaluate each state legislative website—a “state of the states” analysis—to…

The platform aims to create long-lasting scientific value with minimal technical entry barriers—it is valuable to have a global resource that combines photographs generated by Project Pressure in less documented areas.

Ed: Project Pressure has created a platform for crowdsourcing glacier imagery, often photographs taken by climbers and trekkers. Why are scientists interested in these images? And what’s the scientific value of the data set that’s being gathered by the platform? Klaus: Comparative photography using historical photography allows year-on-year comparisons to document glacier change. The platform aims to create long-lasting scientific value with minimal technical entry barriers—it is valuable to have a global resource that combines photographs generated by Project Pressure in less documented areas, with crowdsourced images taken by for example by climbers and trekkers, combined with archival pictures. The platform is future focused and will hopefully allow an up-to-date view on glaciers across the planet. The other ways for scientists to monitor glaciers takes a lot of time and effort; direct measurements of snow fall is a complicated, resource intensive and time-consuming process. And while glacier outlines can be traced from satellite imagery, this still needs to be done manually. Also, you can’t measure the thickness, images can be obscured by debris and cloud cover, and some areas just don’t have very many satellite fly-bys. Ed: There are estimates that the glaciers of Montana’s Glacier National Park will likely to be gone by 2020 and the Ugandan glaciers by 2025, and the Alps are rapidly turning into a region of lakes. These are the famous and very visible examples of glacier loss—what’s the scale of the missing data globally? Klaus: There’s a lot of great research being conducted in this area, however there are approximately 300,000 glaciers world wide, with huge data gaps in South America and the Himalayas for instance. Sharing of Himalayan data between Indian and Chinese scientists has been a sensitive issue, given glacier meltwater is an important strategic resource in the region. But this is a popular trekking route, and it is relatively easy to gather open-source data from the public. Furthermore, there are also…

As the cost and size of devices falls and network access becomes ubiquitous, it is evident that not only major industries but whole areas of consumption, public service and domestic life will be capable of being transformed.

The 2nd Annual Internet of Things Europe 2010: A Roadmap for Europe, 2010. Image by Pierre Metivier.

On 17 April 2013, the US Federal Trade Commission published a call for inputs on the ‘consumer privacy and security issues posed by the growing connectivity of consumer devices, such as cars, appliances, and medical devices’, in other words, about the impact of the Internet of Things (IoT) on the everyday lives of citizens. The call is in large part one for information to establish what the current state of technology development is and how it will develop, but it also looks for views on how privacy risks should be weighed against potential societal benefits. There’s a lot that’s not very new about the IoT. Embedded computing, sensor networks and machine to machine communications have been around a long time. Mark Weiser was developing the concept of ubiquitous computing (and prototyping it) at Xerox PARC in 1990.  Many of the big ideas in the IoT—smart cars, smart homes, wearable computing—are already envisaged in works such as Nicholas Negroponte’s Being Digital, which was published in 1995 before the mass popularisation of the internet itself. The term ‘Internet of Things’ has been around since at least 1999. What is new is the speed with which technological change has made these ideas implementable on a societal scale. The FTC’s interest reflects a growing awareness of the potential significance of the IoT, and the need for public debate about its adoption. As the cost and size of devices falls and network access becomes ubiquitous, it is evident that not only major industries but whole areas of consumption, public service and domestic life will be capable of being transformed. The number of connected devices is likely to grow fast in the next few years. The Organisation for Economic Co-operation and Development (OECD) estimates that while a family with two teenagers may have 10 devices connected to the internet, in 2022 this may well grow to 50 or more. Across the OECD area the number of…

Mobilisation paths are difficult to predict because they depend on the right alignment of conditions on different levels.

The communication technologies once used by rebels and protesters to gain global visibility now look burdensome and dated: much separates the once-futuristic-looking image of Subcomandante Marcos posing in the Chiapas jungle draped in electronic gear (1994) from the uprisings of the 2011 Egyptian revolution. While the only practical platform for amplifying a message was once provided by organisations, the rise of the Internet means that cross-national networks are now reachable by individuals—who are able to bypass organisations, ditch membership dues, and embrace self-organisation. As social media and mobile applications increasingly blur the distinction between public and private, ordinary citizens are becoming crucial nodes in the contemporary protest network. The personal networks that are the main channels of information flow in sites such as Facebook, Twitter and LinkedIn mean that we don’t need to actively seek out particular information; it can be served to us with no more effort than that of maintaining a connection with our contacts. News, opinions, and calls for justice are now shared and forwarded by our friends—and their friends—in a constant churn of information, all attached to familiar names and faces. Given we are more likely to pass on information if the source belongs to our social circle, this has had an important impact on the information environment within which protest movements are initiated and develop. Mobile connectivity is also important for understanding contemporary protest, given that the ubiquitous streams of synchronous information we access anywhere are shortening our reaction times. This is important, as the evolution of mass recruitments—whether they result in flash mobilisations, slow burns, or simply damp squibs—can only be properly understood if we have a handle on the distribution of reaction times within a population. The increasing integration of the mainstream media into our personal networks is also important, given that online networks (and independent platforms like Indymedia) are not the clear-cut alternative to corporate media they once were. We can now…

Despite large investments of law enforcement resources, online child exploitation is nowhere near under control, and while there are numerous technological products to aid this, they still require substantial human intervention.

The Internet has provided the social, individual, and technological circumstances needed for child pornography to flourish. Sex offenders have been able to utilise the Internet for dissemination of child pornographic content, for social networking with other pedophiles through chatrooms and newsgroups, and for sexual communication with children. A 2009 estimate by the United Nations estimates that there are more than four million websites containing child pornography, with 35 percent of them depicting serious sexual assault [1]. Even if this report or others exaggerate the true prevalence of those websites by a wide margin, the fact of the matter is that those websites are pervasive on the world wide web. Despite large investments of law enforcement resources, online child exploitation is nowhere near under control, and while there are numerous technological products to aid in finding child pornography online, they still require substantial human intervention. Despite this, steps can be taken to increase the automation process of these searches, to reduce the amount of content police officers have to examine, and increase the time they can spend on investigating individuals. While law enforcement agencies will aim for maximum disruption of online child exploitation networks by targeting the most connected players, there is a general lack of research on the structural nature of these networks; something we aimed to address in our study, by developing a method to extract child exploitation networks, map their structure, and analyse their content. Our custom-written Child Exploitation Network Extractor (CENE) automatically crawls the Web from a user-specified seed page, collecting information about the pages it visits by recursively following the links out of the page; the result of the crawl is a network structure containing information about the content of the websites, and the linkages between them [2]. We chose ten websites as starting points for the crawls; four were selected from a list of known child pornography websites while the other six were selected and…

Big data generation and analysis requires expertise and skills which can be a particular challenge to governmental organisations.

Recent years have seen an increasing buzz around how ‘Big Data’ can uncover patterns of human behaviour and help predict social trends. Most social activities today leave digital imprints that can be collected and stored in the form of large datasets of transactional data. Access to this data presents powerful and often unanticipated opportunities for researchers and policy makers to generate new, precise, and rapid insights into economic, social and political practices and processes, as well as to tackle longstanding problems that have hitherto been impossible to address, such as how political movements like the ‘Arab Spring’ and Occupy originate and spread. Opening comments from convenor,Helen Margetts While big data can allow the design of efficient and realistic policy and administrative change, it also brings ethical challenges (for example, when it is used for probabilistic policy-making), raising issues of justice, equity and privacy. It also presents clear methodological and technical challenges: big data generation and analysis requires expertise and skills which can be a particular challenge to governmental organisations, given their dubious record on the guardianship of large scale datasets, the management of large technology-based projects, and capacity to innovate. It is these opportunities and challenges that were addressed by the recent conference “Internet, Politics, Policy 2012: Big Data, Big Challenges?” organised by the Oxford Internet Institute (University of Oxford) on behalf of the OII-edited academic journal Policy and Internet. Over the two days of paper and poster presentations and discussion it explored the new research frontiers opened up by big data as well as its limitations, serving as a forum to encourage discussion across disciplinary boundaries on how to exploit this data to inform policy debates and advance social science research. Duncan Watts (Keynote Speaker) The conference was organised along three tracks: “Policy,” “Politics,” and Data+Methods (see the programme) with panels focusing on the impact of big data on (for example) political campaigning, collective action and political dissent, sentiment…

Small changes in individual actions can have large effects at the aggregate level; this opens up the potential for drawing incorrect conclusions about generative mechanisms when only aggregated patterns are analysed.

One of the big social science questions is how our individual actions aggregate into collective patterns of behaviour (think crowds, riots, and revolutions). This question has so far been difficult to tackle due to a lack of appropriate data, and the complexity of the relationship between the individual and the collective. Digital trails are allowing Social Scientists to understand this relationship better. Small changes in individual actions can have large effects at the aggregate level; this opens up the potential for drawing incorrect conclusions about generative mechanisms when only aggregated patterns are analysed, as Schelling aimed to show in his classic example of racial segregation. Part of the reason why it has been so difficult to explore this connection between the individual and the collective—and the unintended consequences that arise from that connection—is lack of proper empirical data, particularly around the structure of interdependence that links individual actions. This relational information is what digital data is now providing; however, they present some new challenges to the social scientist, particularly those who are used to working with smaller, cross-sectional datasets. Suddenly, we can track and analyse the interactions of thousands (if not millions) of people with a time resolution that can go down to the second. The question is how to best aggregate that data and deal with the time dimension. Interactions take place in continuous time; however, most digital interactions are recorded as events (i.e. sending or receiving messages), and different network structures emerge when those events are aggregated according to different windows (i.e. days, weeks, months). We still don’t have systematic knowledge on how transforming continuous data into discrete observation windows affects the networks of interaction we analyse. Reconstructing interpersonal networks (particularly longitudinal network data) used to be extremely time consuming and difficult; now it is relatively easy to obtain that sort of network data, but modelling and analysing them is still a challenge. Another problem faced by social…