Social Data Science

The US accounts for almost 40% of global darknet trade, with Canada and Australia at 15% and 12%, respectively.

My colleagues Joss Wright, Martin Dittus and I have been scraping the world’s largest darknet marketplaces over the last few months, as part of our darknet mapping project. The data we collected allow us to explore a wide range of trading activities, including the trade in the synthetic opioid Fentanyl, one of the drugs blamed for the rapid rise in overdose deaths and widespread opioid addiction in the US. The map shows the global distribution of the Fentanyl trade on the darknet. The US accounts for almost 40% of global darknet trade, with Canada and Australia at 15% and 12%, respectively. The UK and Germany are the largest sellers in Europe with 9% and 5% of sales. While China is often mentioned as an important source of the drug, it accounts for only 4% of darknet sales. However, this does not necessarily mean that China is not the ultimate site of production. Many of the sellers in places like the US, Canada, and Western Europe are likely intermediaries rather than producers themselves. In the next few months, we’ll be sharing more visualisations of the economic geographies of products on the darknet. In the meantime you can find out more about our work by Exploring the Darknet in Five Easy Questions. Follow the project here: https://www.oii.ox.ac.uk/research/projects/economic-geog-darknet/ Twitter: @OiiDarknet

We might expect bot interactions to be relatively predictable and uneventful.

Wikipedia uses editing bots to clean articles: but what happens when their interactions go bad? Image of "Nomade", a sculpture in downtown Des Moines by Jason Mrachina (Flickr CC BY-NC-ND 2.0).

Recent years have seen a huge increase in the number of bots online—including search engine Web crawlers, online customer service chat bots, social media spambots, and content-editing bots in online collaborative communities like Wikipedia. (Bots are important contributors to Wikipedia, completing about 15% of all Wikipedia edits in 2014 overall, and more than 50% in certain language editions.) While the online world has turned into an ecosystem of bots (by which we mean computer scripts that automatically handle repetitive and mundane tasks), our knowledge of how these automated agents interact with each other is rather poor. But being automata without capacity for emotions, meaning-making, creativity, or sociality, we might expect bot interactions to be relatively predictable and uneventful. In their PLOS ONE article “Even good bots fight: The case of Wikipedia”, Milena Tsvetkova, Ruth García-Gavilanes, Luciano Floridi, and Taha Yasseri analyse the interactions between bots that edit articles on Wikipedia. They track the extent to which bots undid each other’s edits over the period 2001–2010, model how pairs of bots interact over time, and identify different types of interaction outcomes. Although Wikipedia bots are intended to support the encyclopaedia—identifying and undoing vandalism, enforcing bans, checking spelling, creating inter-language links, importing content automatically, mining data, identifying copyright violations, greeting newcomers, etc.—the authors find they often undid each other’s edits, with these sterile “fights” sometimes continuing for years. They suggest that even relatively “dumb” bots may give rise to complex interactions, carrying important implications for Artificial Intelligence research. Understanding these bot-bot interactions will be crucial for managing social media, providing adequate cyber-security, and designing autonomous vehicles (that don’t crash). We caught up with Taha Yasseri and Luciano Floridi to discuss the implications of the findings: Ed.: Is there any particular difference between the way individual bots interact (and maybe get bogged down in conflict), and lines of vast and complex code interacting badly, or having unforeseen results (e.g. flash-crashes in automated trading):…

What happens when we turn our everyday experience—in particular, health and wellness-related experience—into data?

Benjamin Franklin used to keep charts of his time spent and virtues lived up to. Today, we use technology to self-track: our hours slept, steps taken, calories consumed, medications administered. But what happens when we turn our everyday experience—in particular, health and wellness-related experience—into data? “Self-Tracking” (MIT Press) by Gina Neff and Dawn Nafus examines how people record, analyse, and reflect on this data—looking at the tools they use and the communities they become part of, and offering an introduction to the essential ideas and key challenges of using these technologies. In considering self-tracking as a social and cultural phenomenon, they describe not only the use of data as a kind of mirror of the self but also how this enables people to connect to, and learn from, others. They also consider what’s at stake: who wants our data and why, the practices of serious self-tracking enthusiasts, the design of commercial self-tracking technology, and how people are turning to self-tracking to fill gaps in the healthcare system. None of us can lead an entirely untracked life today, but in their book, Gina and Dawn show us how to use our data in a way that empowers and educates us. We caught up with Gina to explore the self-tracking movement: Ed.: Over one hundred million wearable sensors were shipped last year to help us gather data about our lives. Is the trend and market for personal health-monitoring devices ever-increasing, or are we seeing saturation of the device market and the things people might conceivably want to (pay to) monitor about themselves? Gina: By focusing on direct-to-consumer wearables and mobile apps for health and wellness in the US we see a lot of tech developed with very little focus on impact or efficacy. I think to some extent we’ve hit the trough in the ‘hype’ cycle, where the initial excitement over digital self-tracking is giving way to the hard and serious work…

Britain has one of the largest Internet economies in the developed world, and the Internet contributes an estimated 8.3 percent to Britain’s GDP.

Despite the huge importance of the Internet in everyday life, we know surprisingly little about the geography of Internet use and participation at sub-national scales. A new article on Local Geographies of Digital Inequality by Grant Blank, Mark Graham, and Claudio Calvino published in Social Science Computer Review proposes a novel method to calculate the local geographies of Internet usage, employing Britain as an initial case study. In the first attempt to estimate Internet use at any small-scale level, they combine data from a sample survey, the 2013 Oxford Internet Survey (OxIS), with the 2011 UK census, employing small area estimation to estimate Internet use in small geographies in Britain. (Read the paper for more on this method, and discussion of why there has been little work on the geography of digital inequality.) There are two major reasons to suspect that geographic differences in Internet use may be important: apparent regional differences and the urban-rural divide. The authors do indeed find a regional difference: the area with least Internet use is in the North East, followed by central Wales; the highest is in London and the South East. But interestingly, geographic differences become non-significant after controlling for demographic variables (age, education, income etc.). That is, demographics matter more than simply where you live, in terms of the likelihood that you’re an Internet user. Britain has one of the largest Internet economies in the developed world, and the Internet contributes an estimated 8.3 percent to Britain’s GDP. By reducing a range of geographic frictions and allowing access to new customers, markets and ideas it strongly supports domestic job and income growth. There are also personal benefits to Internet use. However, these advantages are denied to people who are not online, leading to a stream of research on the so-called digital divide. We caught up with Grant Blank to discuss the policy implications of this marked disparity in (estimated) Internet use across…

Drawing on the rich history of gender studies in the social sciences, coupling it with emerging computational methods for topic modelling, to better understand the content of reports to the Everyday Sexism Project.

The Everyday Sexism Project catalogues instances of sexism experienced by women on a day to day basis. We will be using computational techniques to extract the most commonly occurring sexism-related topics.

As Laura Bates, founder of the Everyday Sexism project, has recently highlighted, “it seems to be increasingly difficult to talk about sexism, equality, and women’s rights” (Everyday Sexism Project, 2015). With many theorists suggesting that we have entered a so-called “post-feminist” era in which gender equality has been achieved (cf. McRobbie, 2008; Modleski, 1991), to complain about sexism not only risks being labelled as “uptight”, “prudish”, or a “militant feminist”, but also exposes those who speak out to sustained, and at times vicious, personal attacks (Everyday Sexism Project, 2015). Despite this, thousands of women are speaking out, through Bates’ project, about their experiences of everyday sexism. Our research seeks to draw on the rich history of gender studies in the social sciences, coupling it with emerging computational methods for topic modelling, to better understand the content of reports to the Everyday Sexism Project and the lived experiences of those who post them. Here, we outline the literature which contextualises our study. Studies on sexism are far from new. Indeed, particularly amongst feminist theorists and sociologists, the analysis (and deconstruction) of “inequality based on sex or gender categorisation” (Harper, 2008) has formed a central tenet of both academic inquiry and a radical politics of female emancipation for several decades (De Beauvoir, 1949; Friedan, 1963; Rubin, 1975; Millett, 1971). Reflecting its feminist origins, historical research on sexism has broadly focused on defining sexist interactions (cf. Glick and Fiske, 1997) and on highlighting the problematic, biologically rooted ‘gender roles’ that form the foundation of inequality between men and women (Millett, 1971; Renzetti and Curran, 1992; Chodorow, 1995). More recent studies, particularly in the field of psychology, have shifted the focus away from whether and how sexism exists, towards an examination of the psychological, personal, and social implications that sexist incidents have for the women who experience them. As such, theorists such as Matteson and Moradi (2005), Swim et al (2001) and Jost and…

Leading policy makers, data scientists and academics came together to discuss how the ATI and government could work together to develop data science for the public good.

The benefits of big data and data science for the private sector are well recognised. So far, considerably less attention has been paid to the power and potential of the growing field of data science for policy-making and public services. On Monday 14th March 2016 the Oxford Internet Institute (OII) and the Alan Turing Institute (ATI) hosted a Summit on Data Science for Government and Policy Making, funded by the EPSRC. Leading policy makers, data scientists and academics came together to discuss how the ATI and government could work together to develop data science for the public good. The convenors of the Summit, Professors Helen Margetts (OII) and Tom Melham (Computer Science), report on the day’s proceedings. The Alan Turing Institute will build on the UK’s existing academic strengths in the analysis and application of big data and algorithm research to place the UK at the forefront of world-wide research in data science. The University of Oxford is one of five university partners, and the OII is the only partnering department in the social sciences. The aim of the summit on Data Science for Government and Policy-Making was to understand how government can make better use of big data and the ATI—with the academic partners in listening mode. We hoped that the participants would bring forward their own stories, hopes and fears regarding data science for the public good. Crucially, we wanted to work out a roadmap for how different stakeholders can work together on the distinct challenges facing government, as opposed to commercial organisations. At the same time, data science research and development has much to gain from the policy-making community. Some of the things that government does—collect tax from the whole population, or give money away at scale, or possess the legitimate use of force—it does by virtue of being government. So the sources of data and some of the data science challenges that public agencies face are…