Politics & Government, Social Data Science

In a world of “connective action” — what makes an influential Twitter user?

by David Sutcliffe 10/06/2018

The new level of connectivity (particularly of social media) raises important questions about its role in the political process.

by David Sutcliffe 10/06/2018

What is the difference between popularity and reputation on the process of social influence? Image: kris krüg (Flickr CC BY-NC-ND 2.0).

A significant part of political deliberation now takes place on online forums and social networking sites, leading to the idea that collective action might be evolving into “connective action”. The new level of connectivity (particularly of social media) raises important questions about its role in the political process. But understanding important phenomena, such as social influence, social forces, and digital divides, requires analysis of very large social systems, which traditionally has been a challenging task in the social sciences. In their Policy & Internet article “Understanding Popularity, Reputation, and Social Influence in the Twitter Society”, David Garcia, Pavlin Mavrodiev, Daniele Casati, and Frank Schweitzer examine popularity, reputation, and social influence on Twitter using network information on more than 40 million users. They integrate measurements of popularity, reputation, and social influence to evaluate what keeps users active, what makes them more popular, and what determines their influence in the network. Popularity in the Twitter social network is often quantified as the number of followers of a user. That implies that it doesn’t matter why some user follows you, or how important she is, your popularity only measures the size of your audience. Reputation, on the other hand, is a more complicated concept associated with centrality. Being followed by a highly reputed user has a stronger effect on one’s reputation than being followed by someone with low reputation. Thus, the simple number of followers does not capture the recursive nature of reputation. In their article, the authors examine the difference between popularity and reputation on the process of social influence. They find that there is a range of values in which the risk of a user becoming inactive grows with popularity and reputation. Popularity in Twitter resembles a proportional growth process that is faster in its strongly connected component, and that can be accelerated by reputation when users are already popular. They find that social influence on Twitter is mainly related to…

Economics, Governance & Security, Mapping, Social Data Science

Mapping Fentanyl Trades on the Darknet

by Mark Graham 16/10/2017

The US accounts for almost 40% of global darknet trade, with Canada and Australia at 15% and 12%, respectively.

by Mark Graham 16/10/2017

My colleagues Joss Wright, Martin Dittus and I have been scraping the world’s largest darknet marketplaces over the last few months, as part of our darknet mapping project. The data we collected allow us to explore a wide range of trading activities, including the trade in the synthetic opioid Fentanyl, one of the drugs blamed for the rapid rise in overdose deaths and widespread opioid addiction in the US. The map shows the global distribution of the Fentanyl trade on the darknet. The US accounts for almost 40% of global darknet trade, with Canada and Australia at 15% and 12%, respectively. The UK and Germany are the largest sellers in Europe with 9% and 5% of sales. While China is often mentioned as an important source of the drug, it accounts for only 4% of darknet sales. However, this does not necessarily mean that China is not the ultimate site of production. Many of the sellers in places like the US, Canada, and Western Europe are likely intermediaries rather than producers themselves. In the next few months, we’ll be sharing more visualisations of the economic geographies of products on the darknet. In the meantime you can find out more about our work by Exploring the Darknet in Five Easy Questions. Follow the project here: https://www.oii.ox.ac.uk/research/projects/economic-geog-darknet/ Twitter: @OiiDarknet

Ethics, Interviews, Social Data Science

Our knowledge of how automated agents interact is rather poor (and that could be a problem)

by David Sutcliffe 14/06/2017

We might expect bot interactions to be relatively predictable and uneventful.

by David Sutcliffe 14/06/2017

Wikipedia uses editing bots to clean articles: but what happens when their interactions go bad? Image of "Nomade", a sculpture in downtown Des Moines by Jason Mrachina (Flickr CC BY-NC-ND 2.0).

Recent years have seen a huge increase in the number of bots online—including search engine Web crawlers, online customer service chat bots, social media spambots, and content-editing bots in online collaborative communities like Wikipedia. (Bots are important contributors to Wikipedia, completing about 15% of all Wikipedia edits in 2014 overall, and more than 50% in certain language editions.) While the online world has turned into an ecosystem of bots (by which we mean computer scripts that automatically handle repetitive and mundane tasks), our knowledge of how these automated agents interact with each other is rather poor. But being automata without capacity for emotions, meaning-making, creativity, or sociality, we might expect bot interactions to be relatively predictable and uneventful. In their PLOS ONE article “Even good bots fight: The case of Wikipedia”, Milena Tsvetkova, Ruth García-Gavilanes, Luciano Floridi, and Taha Yasseri analyse the interactions between bots that edit articles on Wikipedia. They track the extent to which bots undid each other’s edits over the period 2001–2010, model how pairs of bots interact over time, and identify different types of interaction outcomes. Although Wikipedia bots are intended to support the encyclopaedia—identifying and undoing vandalism, enforcing bans, checking spelling, creating inter-language links, importing content automatically, mining data, identifying copyright violations, greeting newcomers, etc.—the authors find they often undid each other’s edits, with these sterile “fights” sometimes continuing for years. They suggest that even relatively “dumb” bots may give rise to complex interactions, carrying important implications for Artificial Intelligence research. Understanding these bot-bot interactions will be crucial for managing social media, providing adequate cyber-security, and designing autonomous vehicles (that don’t crash). We caught up with Taha Yasseri and Luciano Floridi to discuss the implications of the findings: Ed.: Is there any particular difference between the way individual bots interact (and maybe get bogged down in conflict), and lines of vast and complex code interacting badly, or having unforeseen results (e.g. flash-crashes in automated trading):…

Interviews, Politics & Government, Social Data Science

Did you consider Twitter’s (lack of) representativeness before doing that predictive study?

by David Sutcliffe 10/04/2017

Do Twitter users share identical characteristics with the population interest? For what populations are Twitter data actually appropriate?

by David Sutcliffe 10/04/2017

Twitter data have many qualities that appeal to researchers, but are probably not suitable for research where representativeness is important. Image: Bernard Goldbach (Flickr).

Twitter data have many qualities that appeal to researchers. They are extraordinarily easy to collect. They are available in very large quantities. And with a simple 140-character text limit they are easy to analyse. As a result of these attractive qualities, over 1,400 papers have been published using Twitter data, including many attempts to predict disease outbreaks, election results, film box office gross, and stock market movements solely from the content of tweets. Easy availability of Twitter data links nicely to a key goal of computational social science. If researchers can find ways to impute user characteristics from social media, then the capabilities of computational social science would be greatly extended. However few papers consider the digital divide among Twitter users. But the question of who uses Twitter has major implications for research attempts to use the content of tweets for inference about population behaviour. Do Twitter users share identical characteristics with the population interest? For what populations are Twitter data actually appropriate? A new article by Grant Blank published in Social Science Computer Review provides a multivariate empirical analysis of the digital divide among Twitter users, comparing Twitter users and nonusers with respect to their characteristic patterns of Internet activity and to certain key attitudes. It thereby fills a gap in our knowledge about an important social media platform, and it joins a surprisingly small number of studies that describe the population that uses social media. Comparing British (OxIS survey) and US (Pew) data, Grant finds that generally, British Twitter users are younger, wealthier, and better educated than other Internet users, who in turn are younger, wealthier, and better educated than the offline British population. American Twitter users are also younger and wealthier than the rest of the population, but they are not better educated. Twitter users are disproportionately members of elites in both countries. Twitter users also differ from other groups in their online activities and their attitudes.…

Books, Social Data Science, Wellbeing

Exploring the world of self-tracking: who wants our data and why?

by Gina Neff 07/04/2017

What happens when we turn our everyday experience—in particular, health and wellness-related experience—into data?

by Gina Neff 07/04/2017

Benjamin Franklin used to keep charts of his time spent and virtues lived up to. Today, we use technology to self-track: our hours slept, steps taken, calories consumed, medications administered. But what happens when we turn our everyday experience—in particular, health and wellness-related experience—into data? “Self-Tracking” (MIT Press) by Gina Neff and Dawn Nafus examines how people record, analyse, and reflect on this data—looking at the tools they use and the communities they become part of, and offering an introduction to the essential ideas and key challenges of using these technologies. In considering self-tracking as a social and cultural phenomenon, they describe not only the use of data as a kind of mirror of the self but also how this enables people to connect to, and learn from, others. They also consider what’s at stake: who wants our data and why, the practices of serious self-tracking enthusiasts, the design of commercial self-tracking technology, and how people are turning to self-tracking to fill gaps in the healthcare system. None of us can lead an entirely untracked life today, but in their book, Gina and Dawn show us how to use our data in a way that empowers and educates us. We caught up with Gina to explore the self-tracking movement: Ed.: Over one hundred million wearable sensors were shipped last year to help us gather data about our lives. Is the trend and market for personal health-monitoring devices ever-increasing, or are we seeing saturation of the device market and the things people might conceivably want to (pay to) monitor about themselves? Gina: By focusing on direct-to-consumer wearables and mobile apps for health and wellness in the US we see a lot of tech developed with very little focus on impact or efficacy. I think to some extent we’ve hit the trough in the ‘hype’ cycle, where the initial excitement over digital self-tracking is giving way to the hard and serious work…

Interviews, Social Data Science, Wellbeing

Estimating the Local Geographies of Digital Inequality in Britain: London and the South East Show Highest Internet Use—But Why?

by David Sutcliffe 01/03/2017

Britain has one of the largest Internet economies in the developed world, and the Internet contributes an estimated 8.3 percent to Britain’s GDP.

by David Sutcliffe 01/03/2017

Despite the huge importance of the Internet in everyday life, we know surprisingly little about the geography of Internet use and participation at sub-national scales. A new article on Local Geographies of Digital Inequality by Grant Blank, Mark Graham, and Claudio Calvino published in Social Science Computer Review proposes a novel method to calculate the local geographies of Internet usage, employing Britain as an initial case study. In the first attempt to estimate Internet use at any small-scale level, they combine data from a sample survey, the 2013 Oxford Internet Survey (OxIS), with the 2011 UK census, employing small area estimation to estimate Internet use in small geographies in Britain. (Read the paper for more on this method, and discussion of why there has been little work on the geography of digital inequality.) There are two major reasons to suspect that geographic differences in Internet use may be important: apparent regional differences and the urban-rural divide. The authors do indeed find a regional difference: the area with least Internet use is in the North East, followed by central Wales; the highest is in London and the South East. But interestingly, geographic differences become non-significant after controlling for demographic variables (age, education, income etc.). That is, demographics matter more than simply where you live, in terms of the likelihood that you’re an Internet user. Britain has one of the largest Internet economies in the developed world, and the Internet contributes an estimated 8.3 percent to Britain’s GDP. By reducing a range of geographic frictions and allowing access to new customers, markets and ideas it strongly supports domestic job and income growth. There are also personal benefits to Internet use. However, these advantages are denied to people who are not online, leading to a stream of research on the so-called digital divide. We caught up with Grant Blank to discuss the policy implications of this marked disparity in (estimated) Internet use across…

Social Data Science

Edit wars! Examining networks of negative social interaction

by taha yasseri 04/11/2016

While these interactions are less common, they strongly affect people’s psychological well-being, physical health, and work performance.

by taha yasseri 04/11/2016

Network of all reverts done in the English language Wikipedia within one day (January 15, 2010). Read the full article for details

While network science has significantly advanced our understanding of the structure and dynamics of the human social fabric, much of the research has focused on positive relations and interactions such as friendship and collaboration. Considerably less is known about networks of negative social interactions such as distrust, disapproval, and disagreement. While these interactions are less common, they strongly affect people’s psychological well-being, physical health, and work performance. Negative interactions are also rarely explicitly declared and recorded, making them hard for scientists to study. In their new article on the structural and temporal features of negative interactions in the community, Milena Tsvetkova, Ruth García-Gavilanes and Taha Yasseri use complex network methods to analyse patterns in the timing and configuration of reverts of article edits to Wikipedia. In large online collaboration communities like Wikipedia, users sometimes undo or downrate contributions made by other users; most often to maintain and improve the collaborative project. However, it is also possible that these actions are social in nature, with previous research acknowledging that they could also imply negative social interactions. The authors find evidence that Wikipedia editors systematically revert the same person, revert back their reverter, and come to defend a reverted editor. However, they don’t find evidence that editors “pay forward” a revert, coordinate with others to revert an editor, or revert different editors serially. These interactions can be related to the status of the editors. Even though the individual reverts might not necessarily be negative social interactions, their analysis points to the existence of certain patterns of negative social dynamics within the editorial community. Some of these patterns have not been previously explored and certainly carry implications for Wikipedia’s own knowledge collection practices—and can also be applied to other large-scale collaboration networks to identify the existence of negative social interactions. Read the full article: Milena Tsvetkova, Ruth García-Gavilanes and Taha Yasseri (2016) Dynamics of Disagreement: Large-Scale Temporal Network Analysis Reveals Negative Interactions in…

Social Data Science

Sexism Typology: Literature Review

by taha yasseri 31/05/2016

Drawing on the rich history of gender studies in the social sciences, coupling it with emerging computational methods for topic modelling, to better understand the content of reports to the Everyday Sexism Project.

by taha yasseri 31/05/2016

The Everyday Sexism Project catalogues instances of sexism experienced by women on a day to day basis. We will be using computational techniques to extract the most commonly occurring sexism-related topics.

As Laura Bates, founder of the Everyday Sexism project, has recently highlighted, “it seems to be increasingly difficult to talk about sexism, equality, and women’s rights” (Everyday Sexism Project, 2015). With many theorists suggesting that we have entered a so-called “post-feminist” era in which gender equality has been achieved (cf. McRobbie, 2008; Modleski, 1991), to complain about sexism not only risks being labelled as “uptight”, “prudish”, or a “militant feminist”, but also exposes those who speak out to sustained, and at times vicious, personal attacks (Everyday Sexism Project, 2015). Despite this, thousands of women are speaking out, through Bates’ project, about their experiences of everyday sexism. Our research seeks to draw on the rich history of gender studies in the social sciences, coupling it with emerging computational methods for topic modelling, to better understand the content of reports to the Everyday Sexism Project and the lived experiences of those who post them. Here, we outline the literature which contextualises our study. Studies on sexism are far from new. Indeed, particularly amongst feminist theorists and sociologists, the analysis (and deconstruction) of “inequality based on sex or gender categorisation” (Harper, 2008) has formed a central tenet of both academic inquiry and a radical politics of female emancipation for several decades (De Beauvoir, 1949; Friedan, 1963; Rubin, 1975; Millett, 1971). Reflecting its feminist origins, historical research on sexism has broadly focused on defining sexist interactions (cf. Glick and Fiske, 1997) and on highlighting the problematic, biologically rooted ‘gender roles’ that form the foundation of inequality between men and women (Millett, 1971; Renzetti and Curran, 1992; Chodorow, 1995). More recent studies, particularly in the field of psychology, have shifted the focus away from whether and how sexism exists, towards an examination of the psychological, personal, and social implications that sexist incidents have for the women who experience them. As such, theorists such as Matteson and Moradi (2005), Swim et al (2001) and Jost and…

Ethics, Politics & Government, Social Data Science

Alan Turing Institute and OII: Summit on Data Science for Government and Policy Making

by Helen Margetts 31/05/2016

Leading policy makers, data scientists and academics came together to discuss how the ATI and government could work together to develop data science for the public good.

by Helen Margetts 31/05/2016

The benefits of big data and data science for the private sector are well recognised. So far, considerably less attention has been paid to the power and potential of the growing field of data science for policy-making and public services. On Monday 14th March 2016 the Oxford Internet Institute (OII) and the Alan Turing Institute (ATI) hosted a Summit on Data Science for Government and Policy Making, funded by the EPSRC. Leading policy makers, data scientists and academics came together to discuss how the ATI and government could work together to develop data science for the public good. The convenors of the Summit, Professors Helen Margetts (OII) and Tom Melham (Computer Science), report on the day’s proceedings. The Alan Turing Institute will build on the UK’s existing academic strengths in the analysis and application of big data and algorithm research to place the UK at the forefront of world-wide research in data science. The University of Oxford is one of five university partners, and the OII is the only partnering department in the social sciences. The aim of the summit on Data Science for Government and Policy-Making was to understand how government can make better use of big data and the ATI—with the academic partners in listening mode. We hoped that the participants would bring forward their own stories, hopes and fears regarding data science for the public good. Crucially, we wanted to work out a roadmap for how different stakeholders can work together on the distinct challenges facing government, as opposed to commercial organisations. At the same time, data science research and development has much to gain from the policy-making community. Some of the things that government does—collect tax from the whole population, or give money away at scale, or possess the legitimate use of force—it does by virtue of being government. So the sources of data and some of the data science challenges that public agencies face are…

Methods, Social Data Science, Tools

P-values are widely used in the social sciences, but often misunderstood: and that’s a problem.

by taha yasseri 07/03/2016

We need to make standards for interpreting p-values more stringent, and also improve transparency in the academic reporting process.

by taha yasseri 07/03/2016

P-values are widely used in the social sciences, especially ‘big data’ studies, to calculate statistical significance. Yet they are widely criticised for being easily hacked, and for not telling us what we want to know. Many have argued that, as a result, research is wrong far more often than we realize. In their recent article P-values: Misunderstood and Misused OII Research Fellow Taha Yasseri and doctoral student Bertie Vidgen argue that we need to make standards for interpreting p-values more stringent, and also improve transparency in the academic reporting process, if we are to maximise the value of statistical analysis. “Significant”: an illustration of selective reporting andstatistical significance from XKCD. Available online athttp://xkcd.com/882/ In an unprecedented move, the American Statistical Association recently released a statement (March 7 2016) warning against how p-values are currently used. This reflects a growing concern in academic circles that whilst a lot of attention is paid to the huge impact of big data and algorithmic decision-making, there is considerably less focus on the crucial role played by statistics in enabling effective analysis of big data sets, and making sense of the complex relationships contained within them. Because much as datafication has created huge social opportunities, it has also brought to the fore many problems and limitations with current statistical practices. In particular, the deluge of data has made it crucial that we can work out whether studies are ‘significant’. In our paper, published three days before the ASA’s statement, we argued that the most commonly used tool in the social sciences for calculating significance—the p-value—is misused, misunderstood and, most importantly, doesn’t tell us what we want to know. The basic problem of ‘significance’ is simple: it is simply unpractical to repeat an experiment an infinite number of times to make sure that what we observe is “universal”. The same applies to our sample size: we are often unable to analyse a “whole population” sample and so have to…