Mapping Fentanyl Trades on the Darknet

My colleagues Joss Wright, Martin Dittus and I have been scraping the world’s largest darknet marketplaces over the last few months, as part of our darknet mapping project. The data we collected allow us to explore a wide range of trading activities, including the trade in the synthetic opioid Fentanyl, one of the drugs blamed for the rapid rise in overdose deaths and widespread opioid addiction in the US.

The above map shows the global distribution of the Fentanyl trade on the darknet. The US accounts for almost 40% of global darknet trade, with Canada and Australia at 15% and 12%, respectively. The UK and Germany are the largest sellers in Europe with 9% and 5% of sales. While China is often mentioned as an important source of the drug, it accounts for only 4% of darknet sales. However, this does not necessarily mean that China is not the ultimate site of production. Many of the sellers in places like the US, Canada, and Western Europe are likely intermediaries rather than producers themselves.

In the next few months, we’ll be sharing more visualisations of the economic geographies of products on the darknet. In the meantime you can find out more about our work by Exploring the Darknet in Five Easy Questions.

Follow the project here: https://www.oii.ox.ac.uk/research/projects/economic-geog-darknet/

Twitter: @OiiDarknet

Our knowledge of how automated agents interact is rather poor (and that could be a problem)

Recent years have seen a huge increase in the number of bots online — including search engine Web crawlers, online customer service chat bots, social media spambots, and content-editing bots in online collaborative communities like Wikipedia. (Bots are important contributors to Wikipedia, completing about 15% of all Wikipedia edits in 2014 overally, and more than 50% in certain language editions.)

While the online world has turned into an ecosystem of bots (by which we mean computer scripts that automatically handle repetitive and mundane tasks), our knowledge of how these automated agents interact with each other is rather poor. But being automata without capacity for emotions, meaning-making, creativity, or sociality, we might expect bot interactions to be relatively predictable and uneventful.

In their PLOS ONE article “Even good bots fight: The case of Wikipedia“, Milena Tsvetkova, Ruth García-Gavilanes, Luciano Floridi, and Taha Yasseri analyze the interactions between bots that edit articles on Wikipedia. They track the extent to which bots undid each other’s edits over the period 2001–2010, model how pairs of bots interact over time, and identify different types of interaction outcomes. Although Wikipedia bots are intended to support the encyclopaedia — identifying and undoing vandalism, enforcing bans, checking spelling, creating inter-language links, importing content automatically, mining data, identifying copyright violations, greeting newcomers, etc. — the authors find they often undid each other’s edits, with these sterile “fights” sometimes continuing for years.

They suggest that even relatively “dumb” bots may give rise to complex interactions, carrying important implications for Artificial Intelligence research. Understanding these bot-bot interactions will be crucial for managing social media, providing adequate cyber-security, and designing autonomous vehicles (that don’t crash..).

We caught up with Taha Yasseri and Luciano Floridi to discuss the implications of the findings:

Ed.: Is there any particular difference between the way individual bots interact (and maybe get bogged down in conflict), and lines of vast and complex code interacting badly, or having unforeseen results (e.g. flash-crashes in automated trading): i.e. is this just (another) example of us not always being able to anticipate how code interacts in the wild?

Taha: There are similarities and differences. The most notable difference is that here bots are not competing. They all work based on same rules and more importantly to achieve the same goal that is to increase the quality of the encyclopedia. Considering these features, the rather antagonistic interactions between the bots come as a surprise.

Ed.: Wikipedia have said that they know about it, and that it’s a minor problem: but I suppose Wikipedia presents a nice, open, benevolent system to make a start on examining and understanding bot interactions. What other bot-systems are you aware of, or that you could have looked at?

Taha: In terms of content generating bots, Twitter bots have turned out to be very important in terms of online propaganda. The crawlers bots that collect information from social media or the web (such as personal information or email addresses) are also being heavily deployed. In fact we have come up with a first typology of the Internet bots based on their type of action and their intentions (benevolent vs malevolent), that is presented in the article.

Ed.: You’ve also done work on human collaborations (e.g. in the citizen science projects of the Zooniverse) — is there any work comparing human collaborations with bot collaborations — or even examining human-bot collaborations and interactions?

Taha: In the present work we do compare bot-bot interactions with human-human interactions to observe similarities and differences. The most striking difference is in the dynamics of negative interactions. While human conflicts heat up very quickly and then disappear after a while, bots undoing each others’ contribution comes as a steady flow which might persist over years. In the HUMANE project, we discuss the co-existence of humans and machines in the digital world from a theoretical point of view and there we discuss such ecosystems in details.

Ed.: Humans obviously interact badly, fairly often (despite being a social species) .. why should we be particularly worried about how bots interact with each other, given humans seem to expect and cope with social inefficiency, annoyances, conflict and break-down? Isn’t this just more of the same?

Luciano: The fact that bots can be as bad as humans is far from reassuring. The fact that this happens even when they are programmed to collaborate is more disconcerting than what happens among humans when these compete, or fight each other. Here are very elementary mechanisms that through simple interactions generate messy and conflictual outcomes. One may hope this is not evidence of what may happen when more complex systems and interactions are in question. The lesson I learnt from all this is that without rules or some kind of normative framework that promote collaboration, not even good mechanisms ensure a good outcome.

Read the full article: Tsvetkova M, Garcia-Gavilanes R, Floridi, L, Yasseri T (2017) Even good bots fight: The case of Wikipedia. PLoS ONE 12(2): e0171774. doi:10.1371/journal.pone.0171774


Taha Yasseri and Luciano Floridi were talking to blog editor David Sutcliffe.

Exploring the world of self-tracking: who wants our data and why?

Benjamin Franklin used to keep charts of his time spent and virtues lived up to. Today, we use technology to self-track: our hours slept, steps taken, calories consumed, medications administered. But what happens when we turn our everyday experience — in particular, health and wellness-related experience — into data?

Self-Tracking” (MIT Press) by Gina Neff and Dawn Nafus examines how people record, analyze, and reflect on this data — looking at the tools they use and the communities they become part of, and offering an introduction to the essential ideas and key challenges of using these technologies. In considering self-tracking as a social and cultural phenomenon, they describe not only the use of data as a kind of mirror of the self but also how this enables people to connect to, and learn from, others.

They also consider what’s at stake: who wants our data and why, the practices of serious self-tracking enthusiasts, the design of commercial self-tracking technology, and how people are turning to self-tracking to fill gaps in the healthcare system. None of us can lead an entirely untracked life today, but in their book, Gina and Dawn show us how to use our data in a way that empowers and educates us.

We caught up with Gina to explore the self-tracking movement:

Ed.: Over one hundred million wearable sensors were shipped last year to help us gather data about our lives. Is the trend and market for personal health-monitoring devices ever-increasing, or are we seeing saturation of the device market and the things people might conceivably want to (pay to) monitor about themselves?

Gina: By focusing on direct-to-consumer wearables and mobile apps for health and wellness in the US we see a lot of tech developed with very little focus on impact or efficacy. I think to some extent we’ve hit the trough in the ‘hype’ cycle, where the initial excitement over digital self-tracking is giving way to the hard and serious work of figuring out how to make things that improve people’s lives. Recent clinical trial data show that activity trackers, for example, don’t help people to lose weight. What we try to do in the book is to help people figure out what self-tracking to do for them and advocate for people being able to access and control their own data to help them ask — and answer — the questions that they have.

Ed.: A question I was too shy to ask the first time I saw you speak at the OII — how do you put the narrative back into the data? That is, how do you make stories that might mean something to a person, out of the vast piles of strangely meaningful-meaningless numbers that their devices accumulate about them?

Gina: We really emphasise community. It might sound clichéd but it truly helps. When I read some scholars’ critiques of the Quantified Self meetups that happen around the world I wonder if we have actually been to the same meetings. Instead of some kind of technophilia there are people really working to make sense of information about their lives. There’s a lot of love for tech, but there are also people trying to figure out what their numbers mean, are they normal, and how to design their own ‘n of 1’ trials to figure out how to make themselves better, healthier, and happier. Putting narrative back into data really involves sharing results with others and making sense together.

Ed.: There’s already been a lot of fuss about monetisation of NHS health records: I imagine the world of personal health / wellness data is a vast Wild West of opportunity for some (i.e. companies) and potential exploitation of others (i.e. the monitored), with little law or enforcement? For a start .. is this health data or social data? And are these equivalent forms of data, or are they afforded different protections?

Gina: In an opinion piece in Wired UK last summer I asked what happens to data ownership when your smartphone is your doctor. Right now we afford different privacy protection to health-related data than other forms of personal data. But very soon trace data may be useful for clinical diagnoses. There are already in place programmes for using trace data for early detection of mood disorders, and research is underway on using mobile data for the diagnosis of movement disorders. Who will have control and access to these potential early alert systems for our health information? Will it be legally protected to the same extent as the information in our medical records? These are questions that society needs to settle.

Ed.: I like the central irony of “mindfulness” (a meditation technique involving a deep awareness of your own body), i.e. that these devices reveal more about certain aspects of the state of your body than you would know yourself: but you have to focus on something outside of yourself (i.e. a device) to gain that knowledge. Do these monitoring devices support or defeat “mindfulness”?

Gina: I’m of two minds, no pun intended. Many of the Quantified Self experiments we discuss in the book involved people playing with their data in intentional ways and that level of reflection in turn influences how people connect the data about themselves to the changes they want to make in their behaviour. In other words, the act of self-tracking itself may help people to make changes. Some scholars have written about the ‘outsourcing’ of the self, while others have argued that we can develop ‘exosenses’ outside our bodies to extend our experience of the world, bringing us more haptic awareness. Personally, I do see the irony in smartphone apps intended to help us reconnect with ourselves.

Ed.: We are apparently willing to give up a huge amount of privacy (and monetizable data) for convenience, novelty, and to interact with seductive technologies. Is the main driving force of the wearable health-tech industry the actual devices themselves, or the data they collect? i.e. are these self-tracking companies primarily device/hardware companies or software/data companies?

Gina: Sadly, I think it is neither. The drop off in engagement with wearables and apps is steep with the majority falling into disuse after six months. Right now one of the primary concerns I have as an Internet scholar is the apparent lack of empathy companies seem to have for their customers in this space. People operate under the assumption that the data generated by the devices they purchase is ‘theirs’, yet companies too often operate as if they are the sole owners of that data.

Anthropologist Bill Maurer has proposed replacing data ownership with a notion of data ‘kinship’ – that both technology companies and their customers have rights and responsibilities to the data that they produce together. Until we have better social contracts and legal frameworks for people to have control and access to their own data in ways that allow them to extract it, query it, and combine it with other kinds of data, then that problem of engagement will continue and activity trackers will sit unused on bedside tables or uncharged in the back of drawers. The ability to help people ask the next question or design the next self-tracking experiment is where most wearables fail today.

Ed.: And is this data at all clinically useful / interoperable with healthcare and insurance systems? i.e. do the companies producing self-monitoring devices work to particular data and medical standards? And is there any auditing and certification of these devices, and the data they collect?

Gina: This idea that the data is just one interoperable system away from usefulness is seductive but so, so wrong. I was recently at a panel of health innovators, the title of which was ‘No more Apps’. The argument was that we’re not going to get to meaningful change in healthcare simply by adding a new data stream. Doctors in our study said things like ‘I don’t need more data; I need more resources.’ Right now we have few protections for individuals that this data won’t be able to harm their rights to insurance, or won’t be used to discriminate against them and yet there are few results that show how the commercially available wearable devices are delivering clinical value. There’s still a lot of work needed before this can happen.

Ed.: Lastly — just as we share our music on iTunes; could you see a scenario where we start to share our self-status with other device wearers? Maybe to increase our sociability and empathy by being able to send auto-congratulations to people who’ve walked a lot that day, or to show concern to people with elevated heart rates / skin conductivity (etc.)? Given the logical next step to accumulating things is to share them..

Gina: We can see that future scenario now in groups like Patients Like Me, Cure Together, and Quantified Self meetups. What these ‘edge’ use cases teach us for more everyday self-tracking uses is that real support and community can form around people sharing their data with others. These are projects that start from individuals with information about themselves and work to build toward collective, social knowledge. Other types of ‘citizen science’ projects are underway like the Personal Genome Project where people can donate their health data for science. The Stanford-led MyHeart Counts study on iPhone and Apple Watch recruited in its first two weeks 6,000 people for its study and now has over 40,000 US participants. Those are numbers for clinical studies that we’ve just never seen before.

My co-author led the development of an interesting tool, Data Sense, that lets people without stats training visualize the relationships among variables in their own data or easily combine their data with data from other people. When people can do that they can begin asking the questions that matter for them and for their communities. What we know won’t work in the future of self-tracking data, though, are the lightweight online communities that technology brands just throw together. I’m just not going to be motivated by a random message from LovesToWalk1949, but under the right conditions I might be motivated by my mom, my best friend or my social network. There is still a lot of hard work that has to be done to get the design of self-tracking tools, practices, and communities for social support right.


Gina Neff was talking to blog editor David Sutcliffe about her book (with Dawn Naffs) “Self-Tracking” (MIT Press).

Estimating the Local Geographies of Digital Inequality in Britain: London and the South East Show Highest Internet Use — But Why?

Despite the huge importance of the Internet in everyday life, we know surprisingly little about the geography of Internet use and participation at sub-national scales. A new article on Local Geographies of Digital Inequality by Grant Blank, Mark Graham, and Claudio Calvino published in Social Science Computer Review proposes a novel method to calculate the local geographies of Internet usage, employing Britain as an initial case study.

In the first attempt to estimate Internet use at any small-scale level, they combine data from a sample survey, the 2013 Oxford Internet Survey (OxIS), with the 2011 UK census, employing small area estimation to estimate Internet use in small geographies in Britain. (Read the paper for more on this method, and discussion of why there has been little work on the geography of digital inequality.)

There are two major reasons to suspect that geographic differences in Internet use may be important: apparent regional differences and the urban-rural divide. The authors do indeed find a regional difference: the area with least Internet use is in the North East, followed by central Wales; the highest is in London and the South East. But interestingly, geographic differences become non-significant after controlling for demographic variables (age, education, income etc.). That is, demographics matter more than simply where you live, in terms of the likelihood that you’re an Internet user.

Britain has one of the largest Internet economies in the developed world, and the Internet contributes an estimated 8.3 percent to Britain’s GDP. By reducing a range of geographic frictions and allowing access to new customers, markets and ideas it strongly supports domestic job and income growth. There are also personal benefits to Internet use. However, these advantages are denied to people who are not online, leading to a stream of research on the so-called digital divide.

We caught up with Grant Blank to discuss the policy implications of this marked disparity in (estimated) Internet use across Britain.

Ed.: The small-area estimation method you use combines the extreme breadth but shallowness of the national census, with the relative lack of breadth (2000 respondents) but extreme richness (550 variables) of the OxIS survey. Doing this allows you to estimate things like Internet use in fine-grained detail across all of Britain. Is this technique in standard use in government, to understand things like local demand for health services etc.? It seems pretty clever..

Grant: It is used by the government, but not extensively. It is complex and time-consuming to use well, and it requires considerable statistical skills. These have hampered its spread. It probably could be used more than it is — your example of local demand for health services is a good idea..

Ed.: You say this method works for Britain because OxIS collects information based on geographic area (rather than e.g. randomly by phone number) — so we can estimate things geographically for Britain that can’t be done for other countries in the World Internet Project (including the US, Canada, Sweden, Australia). What else will you be doing with the data, based on this happy fact?

Grant: We have used a straightforward measure of Internet use versus non-use as our dependent variable. Similar techniques could predict and map a variety of other variables. For example, we could take a more nuanced view of how people use the Internet. The patterns of mobile use versus fixed-line use may differ geographically and could be mapped. We could separate work-only users, teenagers using social media, or other subsets. Major Internet activities could be mapped, including such things as entertainment use, information gathering, commerce, and content production. In addition, the amount of use and the variety of uses could be mapped. All these are major issues and their geographic distribution has never been tracked.

Ed.: And what might you be able to do by integrating into this model another layer of geocoded (but perhaps not demographically rich or transparent) data, e.g. geolocated social media / Wikipedia activity (etc.)?

Grant: The strength of the data we have is that it is representative of the UK population. The other examples you mention, like Wikipedia activity or geolocated social media, are all done by smaller, self-selected groups of people, who are not at all representative. One possibility would be to show how and in what ways they are unrepresentative.

Ed.: If you say that Internet use actually correlates to the “usual” demographics, i.e. education, age, income — is there anything policy makers can realistically do with this information? i.e. other than hope that people go to school, never age, and get good jobs? What can policy-makers do with these findings?

Grant: The demographic characteristics are things that don’t change quickly. These results point to the limits of the government’s ability to move people online. They say that 100% of the UK population will never be online. This raises the question, what are realistic expectations for online activity? I don’t know the answer to that but it is an important question that is not easily addressed.

Ed.: You say that “The first law of the Internet is that everything is related to age”. When are we likely to have enough longitudinal data to understand whether this is simply because older people never had the chance to embed the Internet in their lives when they were younger, or whether it is indeed the case that older people inherently drop out. Will this age-effect eventually diminish or disappear?

Grant: You ask an important but unresolved question. In the language of social sciences — is the decline in Internet use with age an age-effect or a cohort-effect. An age-effect means that the Internet becomes less valuable as people age and so the decline in use with age is just a reflection of the declining value of the Internet. If this explanation is true then the age-effect will persist into the indefinite future. A cohort-effect implies that the reason older people tend to use the Internet less is that fewer of them learned to use the Internet in school or work. They will eventually be replaced by active Internet-using people and Internet use will no longer be associated with age. The decline with age will eventually disappear. We can address this question using data from the Oxford Internet Survey, but it is not a small area estimation problem.

Read the full article: Blank, G., Graham, M., and Calvino, C. 2017. Local Geographies of Digital Inequality. Social Science Computer Review. DOI: 10.1177/0894439317693332.

This work was supported by the Economic and Social Research Council [grant ES/K00283X/1]. The data have been deposited in the UK Data Archive under the name “Geography of Digital Inequality”.


Grant Blank was speaking to blog editor David Sutcliffe.

Sexism Typology: Literature Review

The Everyday Sexism Project catalogues instances of sexism experienced by women on a day to day basis. We will be using computational techniques to extract the most commonly occurring sexism-related topics.

As Laura Bates, founder of the Everyday Sexism project, has recently highlighted, “it seems to be increasingly difficult to talk about sexism, equality, and women’s rights” (Everyday Sexism Project, 2015). With many theorists suggesting that we have entered a so-called “post-feminist” era in which gender equality has been achieved (cf. McRobbie, 2008; Modleski, 1991), to complain about sexism not only risks being labelled as “uptight”, “prudish”, or a “militant feminist”, but also exposes those who speak out to sustained, and at times vicious, personal attacks (Everyday Sexism Project, 2015). Despite this, thousands of women are speaking out, through Bates’ project, about their experiences of everyday sexism. Our research seeks to draw on the rich history of gender studies in the social sciences, coupling it with emerging computational methods for topic modelling, to better understand the content of reports to the Everyday Sexism Project and the lived experiences of those who post them. Here, we outline the literature which contextualizes our study.

Studies on sexism are far from new. Indeed, particularly amongst feminist theorists and sociologists, the analysis (and deconstruction) of “inequality based on sex or gender categorization” (Harper, 2008) has formed a central tenet of both academic inquiry and a radical politics of female emancipation for several decades (De Beauvoir, 1949; Friedan, 1963; Rubin, 1975; Millett, 1971). Reflecting its feminist origins, historical research on sexism has broadly focused on defining sexist interactions (cf. Glick and Fiske, 1997) and on highlighting the problematic, biologically rooted ‘gender roles’ that form the foundation of inequality between men and women (Millett, 1971; Renzetti and Curran, 1992; Chodorow, 1995).

More recent studies, particularly in the field of psychology, have shifted the focus away from whether and how sexism exists, towards an examination of the psychological, personal, and social implications that sexist incidents have for the women who experience them. As such, theorists such as Matteson and Moradi (2005), Swim et al (2001) and Jost and Kay (2005) have highlighted the damaging intellectual and mental health outcomes for women who are subject to continual experiences of sexism. Atwood, for example, argues in her study of gender bias in families, that sexism combines with other life stressors to create significant psychological distress in women, resulting in them needing to “seek therapy, most commonly for depression and anxiety” (2001, 169).

Given its increasing ubiquity in every day life, it is hardly surprising that the relationship between technology and sexism has also sparked interest from contemporary researchers in the field. Indeed, several studies have explored the intersection between gender and power online, with Susan Herring’s work on gender differences in computer-mediated communication being of particular note (cf. Herring, 2008). Theorists such as Mindi D. Foster have focused on the impact that using digital technology, and particularly Web 2.0 technologies, to talk about sexism can have on women’s well being. Foster’s study found that when women tweeted about sexism, and in particular when they used tweets to a) name the problem, b) criticise it, or c) to suggest change, they viewed their actions as effective and had enhanced life satisfaction, and therefore felt empowered (Foster, 2015: 21).

Despite this diversity of research on sexism, however, there remain some notable gaps in understanding. In particular, as this study hopes to highlight, little previous research on sexism has considered the different ‘types’ of sexism experienced by women (beyond an identification of the workplace and the education system as contexts in which sexism often manifests as per Barnett, 2005; Watkins et al., 2006; Klein, 1992). Furthermore, research focusing on sexism has thus far been largely qualitative in nature. Although a small number of studies have employed quantitative methods (cf. Brandt 2011; Becker and Wright, 2011), none have used computational approaches to analyse the wealth of available online data on sexism.

This project, which will apply a natural language processing approach to analyse data collected from the Everyday Sexism Project website, seeks to fill such a gap. By providing much needed analysis of a large-scale crowd sourced data set on sexism, it is our hope that knowledge gained from this study will advance both the sociological understanding of women’s lived experiences of sexism, and methodological understandings of the suitability of computational topic modelling for conducting this kind of research.

Find out more about the OII’s research on the Everyday Sexism project by visiting the webpage or by looking at the other Policy & Internet blog posts on the project – post 1 and post 2.


Taha Yasseri is a Research Fellow at the OII who has interests in analysis of Big Data to understand human dynamics, government-society interactions, mass collaboration, and opinion dynamics.

Kathryn Eccles is a Research Fellow at the OII who has research interests in the impact of new technologies on scholarly behaviour and research, particularly in the Humanities.

Sophie Melville is a Research Assistant working at the OII. She previously completed the MSc in the Social Science of the Internet at the OII.

References

Atwood, N. C. (2001). Gender bias in families and its clinical implications for women. Social Work, 46 pp. 23–36.

Barnett, R. C. (2005). Ageism and Sexism in the workplace. Generations. 29(3) pp. 25 30.

Bates, Laura. (2015). Everyday Sexism [online] Available at: http://everydaysexism.com [Accessed 1 May 2016].

Becker, Julia C. & Wright, Stephen C. (2011). Yet another dark side of chivalry: Benevolent sexism undermines and hostile sexism motivates collective action for social change. Journal of Personality and Social Psychology, Vol 101(1), Jul 2011, 62-77

Brandt, Mark. (2011). Sexism and Gender Inequality across 57 societies. Psychiological Science. 22(11).

Chodorow, Nancy (1995). “Becoming a feminist foremother”. In Phyllis Chesler, Esther D. Rothblum, Ellen Cole,. Feminist foremothers in women’s studies, psychology, and mental health. New York: Haworth Press. pp. 141–154.

De Beauvoir, Simone. (1949). The second sex, woman as other. London: Vintage.

Foster, M. D. (2015). Tweeting about sexism: The well-being benefits of a social media collective action. British Journal of Social Psychology.

Friedan, Betty. (1963). The Feminine Mystique. W. W. Norton and Co.

Glick, Peter. & Fiske, Susan T. (1997). Hostile and Benevolent Sexism. Psychology of Women Quarterly, 21(1) pp. 119 – 135.

Harper, Amney J. (2008). The relationship between sexism, ambivalent sexism, and relationship quality in heterosexual women. PhD Auburn University.

Herring, Susan C. (2008). Gender and Power in Online Communication. In Janet Holmes and Miriam Meyerhoff (eds) The Handbook of Language and Gender. Oxford: Blackwell.

Jost, J. T., & Kay, A. C. (2005). Exposure to benevolent sexism and complementary gender stereotypes: Consequences for specific and diffuse forms of system justification. Journal of Personality and Social Psychology, 88 pp. 498–509.

Klein, Susan Shurberg. (1992). Sex equity and sexuality in education: breaking the barriers. State University of New York Press.

Matteson, A. V., & Moradi, B. (2005). Examining the structure of the schedule of sexist events: Replication and extension. Psychology of Women Quarterly, 29 pp. 47–57.

McRobbie, Angela (2004). Post-feminism and popular culture. Feminist Media Studies, 4(3) pp. 255 – 264.

Millett, Kate. (1971). Sexual politics. UK: Virago.

Modleski, Tania. (1991). Feminism without women: culture and criticism in a “postfeminist” age. New York: Routledge.

Renzetti, C. and D. Curran, 1992, “Sex-Role Socialization”, in Feminist Philosophies, J. Kourany, J. Sterba, and R. Tong (eds.), New Jersey: Prentice Hall.

Rubin, Gayle. (1975). The traffic in women: notes on the “political economy” of sex. In Rayna R. Reiter (ed.), Toward and anthropology of women. Monthly Review Press.

Swim, J. K., Hyers, L. L., Cohen, L. L., & Ferguson, M. J. (2001). Everyday sexism: Evidence for its incidence, nature, and psychological impact from three daily diary studies. Journal of Social Issues, 57 pp. 31–53.

Watkins et al. (2006). Does it pay to be sexist? The relationship between modern sexism and career outcomes. Journal of Vocational Behaviour. 69(3) pp. 524 – 537.

Alan Turing Institute and OII: Summit on Data Science for Government and Policy Making

The benefits of big data and data science for the private sector are well recognised. So far, considerably less attention has been paid to the power and potential of the growing field of data science for policy-making and public services. On Monday 14th March 2016 the Oxford Internet Institute (OII) and the Alan Turing Institute (ATI) hosted a Summit on Data Science for Government and Policy Making, funded by the EPSRC. Leading policy makers, data scientists and academics came together to discuss how the ATI and government could work together to develop data science for the public good. The convenors of the Summit, Professors Helen Margetts (OII) and Tom Melham (Computer Science), report on the day’s proceedings.

The Alan Turing Institute will build on the UK’s existing academic strengths in the analysis and application of big data and algorithm research to place the UK at the forefront of world-wide research in data science. The University of Oxford is one of five university partners, and the OII is the only partnering department in the social sciences. The aim of the summit on Data Science for Government and Policy-Making was to understand how government can make better use of big data and the ATI – with the academic partners in listening mode.

We hoped that the participants would bring forward their own stories, hopes and fears regarding data science for the public good. Crucially, we wanted to work out a roadmap for how different stakeholders can work together on the distinct challenges facing government, as opposed to commercial organisations. At the same time, data science research and development has much to gain from the policy-making community. Some of the things that government does – collect tax from the whole population, or give money away at scale, or possess the legitimate use of force – it does by virtue of being government. So the sources of data and some of the data science challenges that public agencies face are unique and tackling them could put government working with researchers at the forefront of data science innovation.

During the Summit a range of stakeholders provided insight from their distinctive perspectives; the Government Chief Scientific Advisor, Sir Mark Walport; Deputy Director of the ATI, Patrick Wolfe; the National Statistician and Director of ONS, John Pullinger; Director of Data at the Government Digital Service, Paul Maltby. Representatives of frontline departments recounted how algorithmic decision-making is already bringing predictive capacity into operational business, improving efficiency and effectiveness.

Discussion revolved around the challenges of how to build core capability in data science across government, rather than outsourcing it (as happened in an earlier era with information technology) or confining it to a data science profession. Some delegates talked of being in the ‘foothills’ of data science. The scale, heterogeneity and complexity of some government departments currently works against data science innovation, particularly when larger departments can operate thousands of databases, creating legacy barriers to interoperability. Out-dated policies can work against data science methodologies. Attendees repeatedly voiced concerns about sharing data across government departments, in some case because of limitations of legal protections; in others because people were unsure what they can and cannot do.

The potential power of data science creates an urgent need for discussion of ethics. Delegates and speakers repeatedly affirmed the importance of an ethical framework and for thought leadership in this area, so that ethics is ‘part of the science’. The clear emergent option was a national Council for Data Ethics (along the lines of the Nuffield Council for Bioethics) convened by the ATI, as recommended in the recent Science and Technology parliamentary committee report The big data dilemma and the government response. Luciano Floridi (OII’s professor of the philosophy and ethics of information) warned that we cannot reduce ethics to mere compliance. Ethical problems do not normally have a single straightforward ‘right’ answer, but require dialogue and thought and extend far beyond individual privacy. There was consensus that the UK has the potential to provide global thought leadership and to set the standard for the rest of Europe. It was announced during the Summit that an ATI Working Group on the Ethics of Data Science has been confirmed, to take these issues forward.

So what happens now?

Throughout the Summit there were calls from policy makers for more data science leadership. We hope that the ATI will be instrumental in providing this, and an interface both between government, business and academia, and between separate Government departments. This Summit showed just how much real demand – and enthusiasm – there is from policy makers to develop data science methods and harness the power of big data. No-one wants to repeat with data science the history of government information technology – where in the 1950s and 60s, government led the way as an innovator, but has struggled to maintain this position ever since. We hope that the ATI can act to prevent the same fate for data science and provide both thought leadership and the ‘time and space’ (as one delegate put it) for policy-makers to work with the Institute to develop data science for the public good.

So since the Summit, in response to the clear need that emerged from the discussion and other conversations with stakeholders, the ATI has been designing a Policy Innovation Unit, with the aim of working with government departments on ‘data science for public good’ issues. Activities could include:

  • Secondments at the ATI for data scientists from government
  • Short term projects in government departments for ATI doctoral students and postdoctoral researchers
  • Developing ATI as an accredited data facility for public data, as suggested in the current Cabinet Office consultation on better use of data in government
  • ATI pilot policy projects, using government data
  • Policy symposia focused on specific issues and challenges
  • ATI representation in regular meetings at the senior level (for example, between Chief Scientific Advisors, the Cabinet Office, the Office for National Statistics, GO-Science).
  • ATI acting as an interface between public and private sectors, for example through knowledge exchange and the exploitation of non-government sources as well as government data
  • ATI offering a trusted space, time and a forum for formulating questions and developing solutions that tackle public policy problems and push forward the frontiers of data science
  • ATI as a source of cross-fertilization of expertise between departments
  • Reviewing the data science landscape in a department or agency, identifying feedback loops – or lack thereof – between policy-makers, analysts, front-line staff and identifying possibilities for an ‘intelligent centre’ model through strategic development of expertise.

The Summit, and a series of Whitehall Roundtables convened by GO-Science which led up to it, have initiated a nascent network of stakeholders across government, which we aim to build on and develop over the coming months. If you are interested in being part of this, please do be in touch with us

Helen Margetts, Oxford Internet Institute, University of Oxford (director@oii.ox.ac.uk)

Tom Melham, Department of Computer Science, University of Oxford