The Internet has drastically reshaped communication practices across the globe, including many aspects of modern life. This increased reliance on Internet technology also impacts human rights. The United Nations Human Rights Council has reaffirmed many times (most recently in a 2016 resolution) that “the same rights that people have offline must also be protected online”.
However, only limited guidance is given by international human rights monitoring bodies and courts on how to apply human rights law to the design and use of Internet technology, especially when developed by non-state actors. And while the Internet can certainly facilitate the exercise and fulfilment of human rights, it is also conducive to human rights violations, with many Internet organizations and companies currently grappling with their responsibilities in this area.
To help understand how digital technology can support the exercise of human rights, we—Corinne Cath, Ben Zevenbergen, and Christiaan van Veen—organized a workshop at the 2017 Citizen Lab Summer Institute in Toronto, on ‘Coding Human Rights Law’. By gathering together academics, technologists, human rights experts, lawyers, government officials, and NGO employees, we hoped to gather experience and scope the field to:
1. Explore the relationship between connected technology and human rights;
2. Understand how this technology can support the exercise of human rights;
3. Identify current bottlenecks for integrating human rights considerations into Internet technology; and
4. List recommendations to provide guidance to the various stakeholders working on human-rights strengthening technology.
In the workshop report “Coding Human Rights Law: Citizen Lab Summer Institute 2017 Workshop Report“, we give an overview of the discussion. We address multiple legal and technical concerns. We consider the legal issues arising from human rights law being state-centric, while most connected technologies are being developed by the private sector. We also discuss the applicability of current international human rights frameworks to debates about new technologies. We cover the technical issues that arise when trying to code for human rights, in particular when human rights considerations are integrated into the design and operationalization of Internet technology. We conclude by identifying some areas for further debate and reflection, six of which we list below:
Integrating Human Rights into Internet Technology Design: Six Considerations
1. Further study of the application of instruments of the existing human rights framework, (like the UN Guiding Principles for Business and Human Rights) to Internet actors is needed, including the need for new legal instruments at the national and international level that specify the human rights responsibilities of non-state actors.
2. More research is needed to analyse and rebuild the theories underpinning human rights, given the premises and assumptions grounding them may have been affected by the transition to a digitally mediated society. Much has been done on the rights to privacy and free speech, but more analysis of the relevance of other human rights in this area is needed.
3. Human rights frameworks can best be approached as a legal minimum baseline, while other frameworks, like data protection legislation or technology-specific regulation, provide content to what is aimed for above and beyond this minimum threshold.
1. Taking into account a wider range of international human rights would benefit the development of human rights oriented Internet technology. This means thinking beyond the right to privacy and freedom of expression to include (for example), the right to equality and non-discrimination, and the right to work.
2. Internet technologies, in general, must be developed with an eye towards their potential negative impact and human rights impact assessments undertaken to understand that impact. This includes knowledge of the inherent tensions that exist between different human rights and ensuring that technology developers are precise and considerate about where in the Internet stack they want to have an impact.
3. Technology designers, funders, and implementers need to be aware of the context and culture within which a technology will be used, by involving the target end-users in the design process. For instance, it is important to ensure that human-rights-enabling technology does not price out certain populations from using it.
Internet technology can enable the exercise of human rights—if it is context-aware, recognises the inherent tensions between certain rights (privacy and knowledge; free speech and protection from abuse for example), flexible yet specific, legally sound and ethically just, modest in its claims, and actively understanding and mitigating of potential risks.
With these considerations, we are entering uncharted waters. Unless states have included human rights obligations directly into their national laws, there are few binding obligations on the private sector actors pushing forward the technology. Likewise, there are also few methodologies for developing human-right-enabling technology—meaning that we should be careful and considerate about how these technologies are developed.
Digital technologies are increasingly proposed as innovative solution to the problems and threats faced by vulnerable groups such as children, women, and LGBTQ people. However, there exists a structural lack of consideration for gender and power relations in the design of Internet technologies, as previously discussed by scholars in media and communication studies (Barocas & Nissenbaum, 2009; boyd, 2001; Thakor, 2015) and technology studies (Balsamo, 2011; MacKenzie and Wajcman, 1999). But the intersection between gender-based violence and technology deserves greater attention. To this end, scholars from the Center for Information Technology at Princeton and the Oxford Internet Institute organized a workshop to explore the design ethics of gender-based violence and safety technologies at Princeton in the Spring of 2017.
The workshop welcomed a wide range of advocates in areas of intimate partner violence and sex work; engineers, designers, developers, and academics working on IT ethics. The objectives of the day were threefold:
(1) to better understand the lack of gender considerations in technology design,
(2) to formulate critical questions for functional requirement discussions between advocates and developers of gender-based violence applications; and
(3) to establish a set of criteria by which new applications can be assessed from a gender perspective.
Following three conceptual takeaways from the workshop, we share instructive primers for developers interested in creating technologies for those affected by gender-based violence.
Survivors, sex workers, and young people are intentional technology users
Increasing public awareness of the prevalence gender-based violence, both on and offline, often frames survivors of gender-based violence, activists, and young people as vulnerable and helpless. Contrary to this representation, those affected by gender-based violence are intentional technology users, choosing to adopt or abandon tools as they see fit. For example, sexual assault victims strategically disclose their stories on specific social media platforms to mobilize collective action. Sex workers adopt locative technologies to make safety plans. Young people utilize secure search tools to find information about sexual health resources near them. To fully understand how and why some technologies appear to do more for these communities, developers need to pay greater attention to the depth of their lived experience with technology.
Technologies designed with good intentions do not inherently achieve their stated objectives. Functions that we take for granted to be neutral, such as a ‘Find my iPhone’ feature, can have unintended consequences. In contexts of gender-based violence, abusers and survivors appropriate these technological tools. For example, survivors and sex workers can use such a feature to share their whereabouts with friends in times of need. Abusers, on the other hand, can use the locative functions to stalk their victims. It is crucial to consider the context within which a technology is used, the user’s relationship to their environment, their needs, and interests so that technologies can begin to support those affected by gender-based violence.
Drawing from ecological psychology, technology scholars have described this tension between design and use as affordance, to explain how a user’s perception of what can and cannot be done on a device informs their use. Designers may create a technology with a specific use in mind, but users will appropriate, resist, and improvise their use of the features as they see fit. For example, the use of a hashtags like #SurvivorPrivilege is an example of how rape victims create in-groups on Twitter to engage in supportive discussions, without the intention of it going viral.
1. Predict unintended outcomes
Relatedly, the idea of devices as having affordances allows us to detect how technologies lead to unintended outcomes. Facebook’s ‘authentic name’ policy may have been instituted to promote safety for victims of relationship violence. The social and political contexts in which this policy is used, however, disproportionately affects the safety of human rights activists, drag queens, sex workers, and others — including survivors of partner violence.
2. Question the default
Technology developers are in a position to design the default settings of their technology. Since such settings are typically left unchanged by users, developers must take into account the effect on their target end users. For example, the default notification setting for text messages display the full message content in home screen. A smartphone user may experience texting as a private activity, but the default setting enables other people who are physically co-present to be involved. Opting out of this default setting requires some technical knowledge from the user. In abusive relationships, the abuser can therefore easily access the victim’s text messages through this default setting. So, in designing smartphone applications for survivors, developers should question the default privacy setting.
3. Inclusivity is not generalizability
There appears to be an equation of generalizability with inclusivity. An alarm button that claims to be for generally safety purposes may take a one-size-fits-all approach by automatically connecting the user to law enforcement. In cases of sexual assault, especially involving those who are of color, in sex work, or of LGBTQ identities, survivors are likely to avoid such features precisely because of its connection to law enforcement. This means that those who are most vulnerable are inadvertently excluded from the feature. Alternatively, an alarm feature that centers on these communities may direct the user to local resources. Thus, a feature that is generalizable may overlook target groups it aims to support; a more targeted feature may have less reach, but meet its objective. Just as communities’ needs are context-based, inclusivity, too, is contextualized. Developers should realize that that the broader mission of inclusivity can in fact be completed by addressing a specific need, though this may reduce the scope of end-users.
4. Consider co-designing
How, then, can we develop targeted technologies? Workshop participants suggested co-design (similarly, user-participatory design) as a process through which marginalized communities can take a leading role in developing new technologies. Instead of thinking about communities as passive recipients of technological tools, co-design positions both target communities and technologists as active agents who share skills and knowledge to develop innovative, technological interventions.
5. Involve funders and donors
Breakout group discussions pointed out how developers’ organizational and funding structures play a key role in shaping the kind of technologies they create. Suggested strategies included (1) educating donors about the specific social issue being addressed, (2) carefully considering whether funding sources meet developers’ objectives, and (3) ensuring diversity in the development team.
6. Do no harm with your research
In conducting user research, academics and technologists aim to better understand marginalized groups’ technology uses because they are typically at the forefront of adopting and appropriating digital tools. While it is important to expand our understanding of vulnerable communities’ everyday experience with technology, research on this topic can be used by authorities to further marginalize and target these communities. Take, for example, how tech startups like this align with law enforcement in ways that negatively affect sex workers. To ensure that research done about communities can actually contribute to supporting those communities, academics and developers must be vigilant and cautious about conducting ethical research that protects its subjects.
7. Should this app exist?
The most important question to address at the beginning of a technology design process should be: Should there even be an app for this? The idea that technologies can solve social problems as long as the technologists just “nerd harder” continues to guide the development and funding of new technologies. Many social problems are not necessarily data problems that can be solved by an efficient design and padded with enhanced privacy features. One necessary early strategy of intervention is to simply raise the question of whether technologies truly have a place in the particular context and, if so, whether it addresses a specific need.
Our workshop began with big questions about the intersections of gender-based violence and technology, and concluded with a simple but piercing question: Who designs what for whom? Implicated here are the complex workings of gender, sexuality, and power embedded in the lifetime of newly emerging devices from design to use. Apps and platforms can certainly have their place when confronting social problems, but the flow of data and the revealed information must be carefully tailored to the target context.
The workshop was funded by the Princeton’s Center for Information Technology Policy (CITP), Princeton’s University Center for Human Values, the Ford Foundation, the Mozilla Foundation, and Princeton’s Council on Science and Technology.
To maintain an open and working Internet, we need to make sense of how the complex and decentralised technical system operates. Research groups, governments, and companies have dedicated teams working on highly technical research and experimentation to make sense of information flows and how these can be affected by new developments, be they intentional or due to unforeseen consequences of decisions made in another domain.
These teams, composed of network engineers and computer scientists, therefore analyse Internet data transfers, typically by collecting data from devices of large groups of individuals as well as organisations. The Internet, however, has become a complex and global socio-technical information system that mediates a significant amount of our social or professional activities, relationships, as well as mental processes. Experimentation and research on the Internet therefore require ethical scrutiny in order to give useful feedback to engineers and researchers about the social impact of their work.
The organising committee of the Association of Computing Machinery (ACM) SigComm (Signal Communications) conference has regularly encountered paper submissions that can be considered dubious from an ethical point of view. A strong debate on the research ethics of the ACM was sparked by the paper entitled “Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests,” among others submitted for the 2015 conference. In the study, researchers directed unsuspecting Internet users to test potential censorship systems in their country by directing their browser to specified URLs that could be blocked in their jurisdiction. Concerns were raised about whether this could be considered ‘human subject research’ and whether the unsuspecting users could be harmed as a result of this experiment. Consider, for example, a Chinese citizen continuously requesting the Falun Gong website from their Beijing-based laptop with no knowledge of this occurring whatsoever.
As a result of these discussions, the ACM realised that there was no formal procedure or methodology in place to make informed decisions about the ethical dimensions of such research. The conference therefore hosted a one-day workshop led by the OII’s Ethics in Networked Systems Research (ENSR) project. The day brought together 55 participants from different academic disciplines, ranging from computer science to philosophy, law, sociology, and social science. As part of a broader mission to establish ethical guidelines for Internet research, the aim of the workshop was to inform participants about the pressing ethical issues of the network measurement discipline, and to exchange ideas, reasoning, and proposed solutions.
The workshop began with two interactive sessions in which participants split into small, multidisciplinary groups to debate the submitted papers. Participants recorded their thoughts on key issues that emerged in the discussions. The remaining sessions of the day concentrated on the main themes surfacing from these notes as well as the active feedback of attendees. In this manner, participants from both sides of the debate — that is, the technical researchers and the non-technical researchers — were able to continually quiz each other about the strengths and weaknesses of their approach. The workshop’s emphasis on collaboration across academic disciplines, thereby creating an interdisciplinary community of researchers interested in Internet ethics, aimed to create a more solid foundation for building functional ethical standards in this area.
The interactive discussions yielded some particularly interesting recommendations regarding both the general ethical governance of computer science research as well as particular pressing issues. The main suggestion of the workshop was to create a procedure for an iterative approach to ethical review, whereby the relevant authority (e.g. conference programme committee, institutional ethics board, journal editor, funding agencies) and the researchers could engage in a dialogue about the impact of research, rather than have these restricted by a top-down, one-time decision of the authority.
This approach could be supported by the guidelines that the OII’s ENSR project is currently drafting. Further, participants explored to what extent computer ethics can be taught as part of every module of computer science degrees, rather than the current generic ethics courses generally taught to engineering students. This adjustment would thereby allow aspiring technical researchers to develop a hands-on sense of the social and ethical implications of new technologies and methodologies. Participants agreed that this idea would take an intensive department-wide effort, but would be very worthwhile in the end.
In more practical discussions, participants exchanged views on a wide range of potential solutions or approaches to ethical issues resulting from Internet research. For example, technical researchers struggling with obtaining informed consent were advised to focus their efforts on user-risk mitigation (with many nuances that exceed this blog post). For those studying the Internet in foreign countries, participants recommended running a few probes with the proposed methodology. This exploratory study would then serve to underpin an informed discussion on the possible social implications of the project with organizations and researchers who are more knowledgeable of the local context (e.g. anthropologists, sociologists or NGOs, among others).
Other concrete measures proposed to improve academic research included: fictionalizing rejected case studies to help researchers understand reasons for rejection without creating a ‘hall of shame’; generating a list of basic ethical questions that all papers should answer in the proposal phase; and starting a dialogue with other research communities in analogous situations concerning ethics.
The workshop comprised some high-level discussions to get participants on the same page, and deep dives into specific topics to generate some concrete solutions. As participants wrote down their thoughts on post-it notes, the next steps will be to categorise these notes, develop initial draft guidelines, and discuss these with all participants on the dedicated mailing list.
If you would like to join this mailing list, please e-mail bendert.zevenbergen [at] oii.ox.ac.uk! More detailed write-ups of the workshop outcomes will be published in due course.
Ben Zevenbergen is a student at the Oxford Internet Institute pursuing a DPhil on the intersection of privacy law, technology, social science, and the Internet. He runs a side project that aims to establish ethics guidelines for Internet research, as well as working in multidisciplinary teams such as the EU funded Network of Excellence in Internet Science. He has worked on legal, political and policy aspects of the information society for several years. Most recently he was a policy advisor to an MEP in the European Parliament, working on Europe’s Digital Agenda. Previously Ben worked as an ICT/IP lawyer and policy consultant in the Netherlands. Ben holds a degree in law, specialising in Information Law.
Pamina Smith currently serves as an Assistant Editor at the Oxford Internet Institute and recently completed an MPhil in Comparative Social Policy at the University of Oxford. She previously worked as Assistant Policy Officer at the European Commission, handling broadband policy and telecommunications regulation, and has a degree in the History and Literature of Modern Europe from Harvard College.
根据相关法律法规和政策，部分搜索结果未予显示 could be a warning message we will see displayed more often on the Internet; but likely translations thereof. In Chinese, this means “according to the relevant laws, regulations, and policies, a portion of search results have not been displayed.” The control of information flows on the Internet is becoming more commonplace, in authoritarian regimes as well as in liberal democracies, either via technical or regulatory means. Such information controls can be defined as “[…] actions conducted in or through information and communications technologies (ICTs), which seek to deny (such as web filtering), disrupt (such as denial-of-service attacks), shape (such as throttling), secure (such as through encryption or circumvention) or monitor (such as passive or targeted surveillance) information for political ends. Information controls can also be non-technical and can be implemented through legal and regulatory frameworks, including informal pressures placed on private companies. […]” Information controls are not intrinsically good or bad, but much is to be explored and analysed about their use, for political or commercial purposes.
The University of Toronto’s Citizen Lab organised a one-week summer institute titled “Monitoring Internet Openness and Rights” to inform the global discussions on information control research and practice in the fields of censorship, circumvention, surveillance and adherence to human rights. A week full of presentations and workshops on the intersection of technical tools, social science research, ethical and legal reflections and policy implications was attended by a distinguished group of about 60 community members, amongst whom were two OII DPhil students; Jon Penney and Ben Zevenbergen. Conducting Internet measurements may be considered to be a terra incognita in terms of methodology and data collection, but the relevance and impacts for Internet policy-making, geopolitics or network management are obvious and undisputed.
The Citizen Lab prides itself in being a “hacker hothouse”, or an “intelligence agency for civil society” where security expertise, politics, and ethics intersect. Their research adds the much-needed geopolitical angle to the deeply technical and quantitative Internet measurements they conduct on information networks worldwide. While the Internet is fast becoming the backbone of our modern societies in many positive and welcome ways, abundant (intentional) security vulnerabilities, the ease with which human rights such as privacy and freedom of speech can be violated, threats to the neutrality of the network and the extent of mass surveillance threaten to compromise the potential of our global information sphere. Threats to a free and open internet need to be uncovered and explained to policymakers, in order encourage informed, evidence-based policy decisions, especially in a time when the underlying technology is not well-understood by decision makers.
Participants at the summer institute came with the intent to make sense of Internet measurements and information controls, as well as their social, political and ethical impacts. Through discussions in larger and smaller groups throughout the Munk School of Global Affairs – as well as restaurants and bars around Toronto – the current state of the information controls, their regulation and deployment became clear, and multi-disciplinary projects to measure breaches of human rights on the Internet or its fundamental principles were devised and coordinated.
The outcomes of the week in Toronto are impressive. The OII DPhil students presented their recent work on transparency reporting and ethical data collection in Internet measurement.
Jon Penney gave a talk on “the United States experience” with Internet-related corporate transparency reporting, that is, the evolution of existing American corporate practices in publishing “transparency reports” about the nature and quantity of government and law enforcement requests for Internet user data or content removal. Jon first began working on transparency issues as a Google Policy Fellow with the Citizen Lab in 2011, and his work has continued during his time at Harvard’s Berkman Center for Internet and Society. In this talk, Jon argued that in the U.S., corporate transparency reporting largely began with the leadership of Google and a few other Silicon Valley tech companies like Twitter, but in the Post-Snowden era, has been adopted by a wider cross section of not only technology companies, but also established telecommunications companies like Verizon and AT&T previously resistant to greater transparency in this space (perhaps due to closer, longer term relationships with federal agencies than Silicon Valley companies). Jon also canvassed evolving legal and regulatory challenges facing U.S. transparency reporting and means by which companies may provide some measure of transparency— via tools like warrant canaries— in the face of increasingly complex national security laws.
Ben Zevenbergen has recently launched ethical guidelines for the protection of privacy with regards to Internet measurements conducted via mobile phones. The first panel of the week on “Network Measurement and Information Controls” called explicitly for more concrete ethical and legal guidelines for Internet measurement projects, because the extent of data collection necessarily entails that much personal data is collected and analyzed. In the second panel on “Mobile Security and Privacy”, Ben explained how his guidelines form a privacy impact assessment for a privacy-by-design approach to mobile network measurements. The iterative process of designing a research in close cooperation with colleagues, possibly from different disciplines, ensures that privacy is taken into account at all stages of the project development. His talk led to two connected and well-attended sessions during the week to discuss the ethics of information controls research and Internet measurements. A mailing list has been set up for engineers, programmers, activists, lawyers and ethicists to discuss the ethical and legal aspects of Internet measurements. A data collection has begun to create a taxonomy of ethical issues in the discipline to inform forthcoming peer-reviewed papers.
The Citizen Lab will host its final summer institute of the series in 2015.
Photo credits: Ben Zevenbergen, Jon Penney. Writing Credits: Ben Zevenbergen, with small contribution from Jon Penney.
Ben Zevenbergen is an OII DPhil student and Research Assistant working on the EU Internet Science project. He has worked on legal, political and policy aspects of the information society for several years. Most recently he was a policy advisor to an MEP in the European Parliament, working on Europe’s Digital Agenda.
Jon Penney is a legal academic, doctoral student at the Oxford Internet Institute, and a Research Fellow / Affiliate of both The Citizen Lab an interdisciplinary research lab specializing in digital media, cyber-security, and human rights, at the University of Toronto’s Munk School for Global Affairs, and at the Berkman Center for Internet & Society, Harvard University.
Ed: GCHQ / the NSA aside … Who collects mobile data and for what purpose? How can you tell if your data are being collected and passed on?
Ben: Data collected from mobile phones is used for a wide range of (divergent) purposes. First and foremost, mobile operators need information about mobile phones in real-time to be able to communicate with individual mobile handsets. Apps can also collect all sorts of information, which may be necessary to provide entertainment, location specific services, to conduct network research and many other reasons.
Mobile phone users usually consent to the collection of their data by clicking “I agree” or other legally relevant buttons, but this is not always the case. Sometimes data is collected lawfully without consent, for example for the provision of a mobile connectivity service. Other times it is harder to substantiate a relevant legal basis. Many applications keep track of the information that is generated by a mobile phone and it is often not possible to find out how the receiver processes this data.
Ed: How are data subjects typically recruited for a mobile research project? And how many subjects might a typical research data set contain?
Ben: This depends on the research design; some research projects provide data subjects with a specific app, which they can use to conduct measurements (so called ‘active measurements’). Other apps collect data in the background and, in effect, conduct local surveillance of the mobile phone use (so called passive measurements). Other research uses existing datasets, for example provided by telecom operators, which will generally be de-identified in some way. We purposely do not use the term anonymisation in the report, because much research and several case studies have shown that real anonymisation is very difficult to achieve if the original raw data is collected about individuals. Datasets can be re-identified by techniques such as fingerprinting or by linking them with existing, auxiliary datasets.
The size of datasets differs per release. Telecom operators can provide data about millions of users, while it will be more challenging to reach such a number with a research specific app. However, depending on the information collected and provided, a specific app may provide richer information about a user’s behaviour.
Ed: What sort of research can be done with this sort of data?
Ben: Data collected from mobile phones can reveal much interesting and useful information. For example, such data can show exact geographic locations and thus the movements of the owner, which can be relevant for the social sciences. On a larger scale, mass movements of persons can be monitored via mobile phones. This information is useful for public policy objectives such as crowd control, traffic management, identifying migration patterns, emergency aid, etc. Such data can also be very useful for commercial purposes, such as location specific advertising, studying the movement of consumers, or generally studying the use of mobile phones.
Mobile phone data is also necessary to understand the complex dynamics of the underlying Internet architecture. The mobile Internet is has different requirements than the fixed line Internet, so targeted investments in future Internet architecture will need to be assessed by detailed network research. Also, network research can study issues such as censorship or other forms of blocking information and transactions, which are increasingly carried out through mobile phones. This can serve as early warning systems for policy makers, activists and humanitarian aid workers, to name only a few stakeholders.
Ed: Some of these research datasets are later published as ‘open data’. What sorts of uses might researchers (or companies) put these data to? Does it tend to be mostly technical research, or there also social science applications?
Ben: The intriguing characteristic of the open data concept is that secondary uses can be unpredictable. A re-use is not necessarily technical, even if the raw data has been collected for a purely technical network research. New social science research could be based on existing technical data, or existing research analyses may be falsified or validated by other researchers. Artists, developers, entrepreneurs or public authorities can also use existing data to create new applications or to enrich existing information systems. There have been many instances when open data has been re-used for beneficial or profitable means.
However, there is also a flipside to open data, especially when the dataset contains personal information, or information that can be linked to individuals. A working definition of open data is that one makes entire databases available, in standardized, machine readable and electronic format, to any secondary user, free of charge and free of restrictions or obligations, for any purpose. If a dataset contains information about your Internet browsing habits, your movements throughout the day or the phone numbers you have called over a specific period of time, it could be quite troubling if you have no control over who re-uses this information.
The risks and harms of such re-use are very context dependent, of course. In the Western world, such data could be used as means for blackmail, stalking, identity theft, unsolicited commercial communications, etc. Further, if there is a chance our telecom operators just share data on how we use our mobile phones, we may refrain from activities, such as taking part in demonstrations, attending political gatherings, or accessing certain socially unacceptable information. Such self-censorship will damage the free society we expect. In the developing world, or in authoritarian regimes, risks and harms can be a matter of life and death for data subjects, or at least involve the risk of physical harm. This is true for all citizens, but also diplomats, aid workers and journalists or social media users.
Finally, we cannot envisage how political contexts will change in the future. Future malevolent governments, even in Europe or the US, could easily use datasets containing sensitive information to harm or control specific groups of society. One only need look at the changing political landscape in Hungary to see how specific groups are suddenly targeted in what we thought was becoming a country that adheres to Western values.
Ed: The ethical privacy guidelines note the basic relation between the level of detail in information collected and the resulting usefulness of the dataset (datasets becoming less powerful as subjects are increasingly de-identified). This seems a fairly intuitive and fundamentally unavoidable problem; is there anything in particular to say about it?
Ben: Research often requires rich datasets for worthwhile analyses to be conducted. These will inevitably sometimes contain personal information, as it can be important to relate specific data to data subjects, whether anonymised, pseudonymised or otherwise. Far reaching deletion, aggregation or randomisation of data can make the dataset useless for the research purposes.
Sophisticated methods of re-identifying datasets, and unforeseen methods which will be developed in future, mean that much information must be deleted or aggregated in order for a dataset containing personal information to be truly anonymous. It has become very difficult to determine when a dataset is sufficiently anonymised to the extent that it can enjoy the legal exception offered by data protection laws around the world and therefore be distributed as open data, without legal restrictions.
As a result, many research datasets cannot simply be released. The guidelines do not force the researcher to a zero-risk situation, where only useless or meaningless datasets can be released. The guidelines force the researcher to think very carefully about the type of data that will be collected, about data processing techniques and different disclosure methods. Although open data is an attractive method of disseminating research data, sometimes managed access systems may be more appropriate. The guidelines constantly trigger the researcher to consider the risks to data subjects in their specific context during each stage of the research design. They serve as a guide, but also a normative framework for research that is potentially privacy invasive.
Ed: Presumably mobile companies have a duty to delete their data after a certain period; does this conflict with open datasets, whose aim is to be available indefinitely?
Ben: It is not a requirement for open data to be available indefinitely. However, once information is published freely on the Internet, it is very hard – if not impossible – to delete it. The researcher loses all control over a dataset once it is published online. So, if a dataset is sufficiently de-identified for the re-identification techniques that are known today, this does not mean that future techniques cannot re-identify the dataset. We can’t expect researchers to take into account all science-fiction type future developments, but the guidelines to force the researcher to consider what successful re-identification would reveal about data subjects.
European mobile phone companies do have a duty to keep logs of communications for 6 months to 2 years, depending on the implication of the misguided data retention directive. We have recently learned that intelligence services worldwide have more or less unrestricted access to such information. We have no idea how long this information is stored in practice. Recently it has been frequently been stated that deleting data has become more expensive than just keeping it. This means that mobile phone operators and intelligence agencies may keep data on our mobile phone use forever. This must be taken into account when assessing which auxiliary datasets could be used to re-identify a research dataset. An IP-address could be sufficient to link much information to an individual.
Ed: Presumably it’s impossible for a subject to later decide they want to be taken out of an open dataset; firstly due to cost, but also because (by definition) it ought to be impossible to find them in an anonymised dataset. Does this present any practical or legal problems?
Ben: In some countries, especially in Europe, data subjects have a legal right to object to their data being processed, by withdrawing consent or engaging in a legal procedure with the data processor. Although this is an important right, exercising it may lead to undesirable consequences for research. For example, the underlying dataset will be incomplete for secondary researchers who want to validate findings.
Our guidelines encourage researchers to be transparent about their research design, data processing and foreseeable secondary uses of the data. On the one hand, this builds trust in the network research discipline. On the other, it gives data subjects the necessary information to feel confident to share their data. Still, data subjects should be able to retract their consent via electronic means, instead of sending letters, if they can substantiate an appreciable harm to them.
Ed: How aware are funding bodies and ethics boards of the particular problems presented by mobile research; and are they categorically different from other human-subject research data? (eg interviews / social network data / genetic studies etc.)
Ben: University ethical boards or funding bodies are be staffed by experts in a wide range of disciplines. However, this does not mean they understand the intricate details of complex Internet measurements, de-identification techniques or the state of affairs with regards to re-identification techniques, nor the harms a research programme can inflict given a specific context. For example, not everyone’s intuitive moral privacy compass will be activated when they read in a research proposal that the research systems will “monitor routing dynamics, by analysing packet traces collected from cell towers and internet exchanges”, or similar sentences.
Our guidelines encourage the researcher to write up the choices made with regards to personal information in a manner that is clear and understandable for the layperson. Such a level of transparency is useful for data subjects — as well as ethical boards and funding bodies — to understand exactly what the research entails and how risks have been accommodated.
Ben: Privacy legislation itself is about as fragmented and disputed as it gets. The US generally treats personal information as a commodity that can be traded, which enables Internet companies in Silicon Valley to use data as the new raw material in the information age. Europe considers privacy and data protection as a fundamental right, which is currently regulated in detail, albeit based on a law from 1995. The review of European data protection regulation has been postponed to 2015, possibly as a result of the intense lobbying effort in Brussels to either weaken or strengthen the proposed law. Some countries have not regulated privacy or data protection at all. Other countries have a fundamental right to privacy, which is not further developed in a specific data protection law and thus hardly enforced. Another group of countries have transplanted the European approach, but do not have the legal expertise to apply the 1995 law to the digital environment. The future of data protection is very much up in the air and requires much careful study.
The guidelines we have publishing take the international human rights framework as a base, while drawing inspiration from several existing legal concepts such as data minimisation, purpose limitation, privacy by design and informed consent. The guidelines give a solid base for privacy aware research design. We do encourage researchers to discuss their projects with colleagues and legal experts as much as possible, though, because best practices and legal subtleties can vary per country, state or region.