Ethics

Leading policy makers, data scientists and academics came together to discuss how the ATI and government could work together to develop data science for the public good.

The benefits of big data and data science for the private sector are well recognised. So far, considerably less attention has been paid to the power and potential of the growing field of data science for policy-making and public services. On Monday 14th March 2016 the Oxford Internet Institute (OII) and the Alan Turing Institute (ATI) hosted a Summit on Data Science for Government and Policy Making, funded by the EPSRC. Leading policy makers, data scientists and academics came together to discuss how the ATI and government could work together to develop data science for the public good. The convenors of the Summit, Professors Helen Margetts (OII) and Tom Melham (Computer Science), report on the day’s proceedings. The Alan Turing Institute will build on the UK’s existing academic strengths in the analysis and application of big data and algorithm research to place the UK at the forefront of world-wide research in data science. The University of Oxford is one of five university partners, and the OII is the only partnering department in the social sciences. The aim of the summit on Data Science for Government and Policy-Making was to understand how government can make better use of big data and the ATI—with the academic partners in listening mode. We hoped that the participants would bring forward their own stories, hopes and fears regarding data science for the public good. Crucially, we wanted to work out a roadmap for how different stakeholders can work together on the distinct challenges facing government, as opposed to commercial organisations. At the same time, data science research and development has much to gain from the policy-making community. Some of the things that government does—collect tax from the whole population, or give money away at scale, or possess the legitimate use of force—it does by virtue of being government. So the sources of data and some of the data science challenges that public agencies face are…

Exploring the complexities of policing the web for extremist material, and its implications for security, privacy and human rights.

In terms of counter-speech there are different roles for government, civil society, and industry. Image by Miguel Discart (Flickr).

The Internet serves not only as a breeding ground for extremism, but also offers myriad data streams which potentially hold great value to law enforcement. The report by the OII’s Ian Brown and Josh Cowls for the VOX-Pol project: Check the Web: Assessing the Ethics and Politics of Policing the Internet for Extremist Material explores the complexities of policing the web for extremist material, and its implications for security, privacy and human rights. Josh Cowls discusses the report with blog editor Bertie Vidgen.* *please note that the views given here do not necessarily reflect the content of the report, or those of the lead author, Ian Brown. Ed: Josh, could you let us know the purpose of the report, outline some of the key findings, and tell us how you went about researching the topic? Josh: Sure. In the report we take a step back from the ground-level question of ‘what are the police doing?’ and instead ask, ‘what are the ethical and political boundaries, rationale and justifications for policing the web for these kinds of activity?’ We used an international human rights framework as an ethical and legal basis to understand what is being done. We also tried to further the debate by clarifying a few things: what has already been done by law enforcement, and, really crucially, what the perspectives are of all those involved, including lawmakers, law enforcers, technology companies, academia and many others. We derived the insights in the report from a series of workshops, one of which was held as part of the EU-funded VOX-Pol network. The workshops involved participants who were quite high up in law enforcement, the intelligence agencies, the tech industry civil society, and academia. We followed these up with interviews with other individuals in similar positions and conducted background policy research. Ed: You highlight that many extremist groups (such as Isis) are making really significant use of online platforms to organise,…

Experimentation and research on the Internet require ethical scrutiny in order to give useful feedback to engineers and researchers about the social impact of their work.

The image shows the paths taken through the Internet to reach a large number of DNS servers in China used in experiments on DNS censorship by Joss Wright and Ning Wang, where they queried blocked domain names across China to discover patterns in where the network filtered DNS requests, and how it responded.

To maintain an open and working Internet, we need to make sense of how the complex and decentralised technical system operates. Research groups, governments, and companies have dedicated teams working on highly technical research and experimentation to make sense of information flows and how these can be affected by new developments, be they intentional or due to unforeseen consequences of decisions made in another domain. These teams, composed of network engineers and computer scientists, therefore analyse Internet data transfers, typically by collecting data from devices of large groups of individuals as well as organisations. The Internet, however, has become a complex and global socio-technical information system that mediates a significant amount of our social or professional activities, relationships, as well as mental processes. Experimentation and research on the Internet therefore require ethical scrutiny in order to give useful feedback to engineers and researchers about the social impact of their work. The organising committee of the Association of Computing Machinery (ACM) SigComm (Signal Communications) conference has regularly encountered paper submissions that can be considered dubious from an ethical point of view. A strong debate on the research ethics of the ACM was sparked by the paper entitled “Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests,” among others submitted for the 2015 conference. In the study, researchers directed unsuspecting Internet users to test potential censorship systems in their country by directing their browser to specified URLs that could be blocked in their jurisdiction. Concerns were raised about whether this could be considered ‘human subject research’ and whether the unsuspecting users could be harmed as a result of this experiment. Consider, for example, a Chinese citizen continuously requesting the Falun Gong website from their Beijing-based laptop with no knowledge of this occurring whatsoever. As a result of these discussions, the ACM realised that there was no formal procedure or methodology in place to make informed decisions about the ethical dimensions of such…

Information has now acquired a pivotal role in contemporary warfare, for it has become both an effective target and a viable means.

Critical infrastructures such as electric power grids are susceptible to cyberwarfare, leading to economic disruption in the event of massive power outages. Image courtesy of Pacific Northwest National Laboratory

Before the pervasive dissemination of Information and Communication Technologies (ICTs), the use of information in war waging referred to intelligence gathering and propaganda. In the age of the information revolution things have radically changed. Information has now acquired a pivotal role in contemporary warfare, for it has become both an effective target and a viable means. These days, we use ‘cyber warfare’ to refer to the use of ICTs by state actors to disruptive (or even destructive) ends. As contemporary societies grow increasingly dependant on ICTs, any form of attack that involves their informational infrastructures poses serious risks and raises the need for adequate defence and regulatory measures. However, such a need contrasts with the novelty of this phenomenon, with cyber warfare posing a radical shift in the paradigm within which warfare has been conceived so far. In the new paradigm, impairment of functionality, disruption, and reversible damage substitute for bloodshed, destruction, and casualties. At the same time, the intangible environment (the cyber sphere), targets, and agents substitute for beings in blood and flesh, firearms, and physical targets (at least in the non-kinetic instances of cyber warfare). The paradigm shift raises questions about the adequacy and efficacy of existing laws and ethical theories for the regulation of cyber warfare. Military experts, strategy planners, law- and policy-makers, philosophers, and ethicists all participate in discussions around this problem. The debate is polarised around two main approaches: (1) the analogy approach, and (2) the discontinuous approach. The former stresses that the regulatory gap concerning cyber warfare is only apparent, insofar as cyber conflicts are not radically different from other forms of conflicts. As Schmitt put it “a thick web of international law norms suffuses cyber-space. These norms both outlaw many malevolent cyber-operations and allow states to mount robust responses”. The UN Charter, NATO Treaty, Geneva Conventions, the first two Additional Protocols, and Convention restricting or prohibiting the use of certain conventional weapons are…

How will we keep healthy? How will we live, learn, work and interact in the future? How will we produce and consume and how will we manage resources?

On October 6 and 7, the European Commission, with the participation of Portuguese authorities and the support of the Champalimaud Foundation, organised in Lisbon a high-level conference on “The Future of Europe is Science”. Mr. Barroso, President of the European Commission, opened the meeting. I had the honour of giving one of the keynote addresses. The explicit goal of the conference was twofold. On the one hand, we tried to take stock of European achievements in science, engineering, technology and innovation (SETI) during the last 10 years. On the other hand, we looked into potential future opportunities that SETI may bring to Europe, both in economic terms (growth, jobs, new business opportunities) and in terms of wellbeing (individual welfare and higher social standards). One of the most interesting aspects of the meeting was the presentation of the latest report on “The Future of Europe is Science” by the President’s Science and Technology Advisory Council (STAC). The report addresses some very big questions: How will we keep healthy? How will we live, learn, work and interact in the future? How will we produce and consume and how will we manage resources? It also seeks to outline some key challenges that will be faced by Europe over the next 15 years. It is well written, clear, evidence-based and convincing. I recommend reading it. In what follows, I wish to highlight three of its features that I find particularly significant. First, it is enormously refreshing and reassuring to see that the report treats science and technology as equally important and intertwined. The report takes this for granted, but anyone stuck in some Greek dichotomy between knowledge (episteme, science) and mere technique (techne, technology) will be astonished. While this divorcing of the two has always been a bad idea, it is still popular in contexts where applied science, e.g. applied physics or engineering, is considered a Cinderella. During my talk, I referred to Galileo as a paradigmatic scientist who had to be innovative in terms of…

People are very often unaware of how much data is gathered about them—let alone the purposes for which it can be used.

MEPs failed to support a Green call to protect Edward Snowden as a whistleblower, in order to allow him to give his testimony to the European Parliament in March. Image by greensefa.

Computers have developed enormously since the Second World War: alongside a rough doubling of computer power every two years, communications bandwidth and storage capacity have grown just as quickly. Computers can now store much more personal data, process it much faster, and rapidly share it across networks. Data is collected about us as we interact with digital technology, directly and via organisations. Many people volunteer data to social networking sites, and sensors—in smartphones, CCTV cameras, and “Internet of Things” objects—are making the physical world as trackable as the virtual. People are very often unaware of how much data is gathered about them—let alone the purposes for which it can be used. Also, most privacy risks are highly probabilistic, cumulative, and difficult to calculate. A student sharing a photo today might not be thinking about a future interview panel; or that the heart rate data shared from a fitness gadget might affect future decisions by insurance and financial services (Brown 2014). Rather than organisations waiting for something to go wrong, then spending large amounts of time and money trying (and often failing) to fix privacy problems, computer scientists have been developing methods for designing privacy directly into new technologies and systems (Spiekermann and Cranor 2009). One of the most important principles is data minimisation; that is, limiting the collection of personal data to that needed to provide a service—rather than storing everything that can be conveniently retrieved. This limits the impact of data losses and breaches, for example by corrupt staff with authorised access to data—a practice that the UK Information Commissioner’s Office (2006) has shown to be widespread. Privacy by design also protects against function creep (Gürses et al. 2011). When an organisation invests significant resources to collect personal data for one reason, it can be very tempting to use it for other purposes. While this is limited in the EU by data protection law, government agencies are in a…

Measuring the mobile Internet can expose information about an individual’s location, contact details, and communications metadata.

Four of the 6.8 billion mobile phones worldwide. Measuring the mobile Internet can expose information about an individual's location, contact details, and communications metadata. Image by Cocoarmani.

Ed: GCHQ / the NSA aside, who collects mobile data and for what purpose? How can you tell if your data are being collected and passed on? Ben: Data collected from mobile phones is used for a wide range of (divergent) purposes. First and foremost, mobile operators need information about mobile phones in real-time to be able to communicate with individual mobile handsets. Apps can also collect all sorts of information, which may be necessary to provide entertainment, location specific services, to conduct network research and many other reasons. Mobile phone users usually consent to the collection of their data by clicking “I agree” or other legally relevant buttons, but this is not always the case. Sometimes data is collected lawfully without consent, for example for the provision of a mobile connectivity service. Other times it is harder to substantiate a relevant legal basis. Many applications keep track of the information that is generated by a mobile phone and it is often not possible to find out how the receiver processes this data. Ed: How are data subjects typically recruited for a mobile research project? And how many subjects might a typical research data set contain? Ben: This depends on the research design; some research projects provide data subjects with a specific app, which they can use to conduct measurements (so called ‘active measurements’). Other apps collect data in the background and, in effect, conduct local surveillance of the mobile phone use (so called passive measurements). Other research uses existing datasets, for example provided by telecom operators, which will generally be de-identified in some way. We purposely do not use the term anonymisation in the report, because much research and several case studies have shown that real anonymisation is very difficult to achieve if the original raw data is collected about individuals. Datasets can be re-identified by techniques such as fingerprinting or by linking them with existing, auxiliary datasets. The size…

Widespread use of digital technologies, the Internet and social media means both citizens and governments leave digital traces that can be harvested to generate big data.

The environment in which public policy is made has entered a period of dramatic change. Widespread use of digital technologies, the Internet and social media means both citizens and governments leave digital traces that can be harvested to generate big data. Policy-making takes place in an increasingly rich data environment, which poses both promises and threats to policy-makers. On the promise side, such data offers a chance for policy-making and implementation to be more citizen-focused, taking account of citizens’ needs, preferences and actual experience of public services, as recorded on social media platforms. As citizens express policy opinions on social networking sites such as Twitter and Facebook; rate or rank services or agencies on government applications such as NHS Choices; or enter discussions on the burgeoning range of social enterprise and NGO sites, such as Mumsnet, 38 degrees and patientopinion.org, they generate a whole range of data that government agencies might harvest to good use. Policy-makers also have access to a huge range of data on citizens’ actual behaviour, as recorded digitally whenever citizens interact with government administration or undertake some act of civic engagement, such as signing a petition. Data mined from social media or administrative operations in this way also provide a range of new data which can enable government agencies to monitor—and improve—their own performance, for example through log usage data of their own electronic presence or transactions recorded on internal information systems, which are increasingly interlinked. And they can use data from social media for self-improvement, by understanding what people are saying about government, and which policies, services or providers are attracting negative opinions and complaints, enabling identification of a failing school, hospital or contractor, for example. They can solicit such data via their own sites, or those of social enterprises. And they can find out what people are concerned about or looking for, from the Google Search API or Google trends, which record the search…