Ethical privacy guidelines for mobile connectivity measurements

Four of the 6.8 billion mobile phones worldwide. Measuring the mobile Internet can expose information about an individual’s location, contact details, and communications metadata. Image by Cocoarmani.

Ed: GCHQ / the NSA aside … Who collects mobile data and for what purpose? How can you tell if your data are being collected and passed on?

Ben: Data collected from mobile phones is used for a wide range of (divergent) purposes. First and foremost, mobile operators need information about mobile phones in real-time to be able to communicate with individual mobile handsets. Apps can also collect all sorts of information, which may be necessary to provide entertainment, location specific services, to conduct network research and many other reasons.

Mobile phone users usually consent to the collection of their data by clicking “I agree” or other legally relevant buttons, but this is not always the case. Sometimes data is collected lawfully without consent, for example for the provision of a mobile connectivity service. Other times it is harder to substantiate a relevant legal basis. Many applications keep track of the information that is generated by a mobile phone and it is often not possible to find out how the receiver processes this data.

Ed: How are data subjects typically recruited for a mobile research project? And how many subjects might a typical research data set contain?

Ben: This depends on the research design; some research projects provide data subjects with a specific app, which they can use to conduct measurements (so called ‘active measurements’). Other apps collect data in the background and, in effect, conduct local surveillance of the mobile phone use (so called passive measurements). Other research uses existing datasets, for example provided by telecom operators, which will generally be de-identified in some way. We purposely do not use the term anonymisation in the report, because much research and several case studies have shown that real anonymisation is very difficult to achieve if the original raw data is collected about individuals. Datasets can be re-identified by techniques such as fingerprinting or by linking them with existing, auxiliary datasets.

The size of datasets differs per release. Telecom operators can provide data about millions of users, while it will be more challenging to reach such a number with a research specific app. However, depending on the information collected and provided, a specific app may provide richer information about a user’s behaviour.

Ed: What sort of research can be done with this sort of data?

Ben: Data collected from mobile phones can reveal much interesting and useful information. For example, such data can show exact geographic locations and thus the movements of the owner, which can be relevant for the social sciences. On a larger scale, mass movements of persons can be monitored via mobile phones. This information is useful for public policy objectives such as crowd control, traffic management, identifying migration patterns, emergency aid, etc. Such data can also be very useful for commercial purposes, such as location specific advertising, studying the movement of consumers, or generally studying the use of mobile phones.

Mobile phone data is also necessary to understand the complex dynamics of the underlying Internet architecture. The mobile Internet is has different requirements than the fixed line Internet, so targeted investments in future Internet architecture will need to be assessed by detailed network research. Also, network research can study issues such as censorship or other forms of blocking information and transactions, which are increasingly carried out through mobile phones. This can serve as early warning systems for policy makers, activists and humanitarian aid workers, to name only a few stakeholders.

Ed: Some of these research datasets are later published as ‘open data’. What sorts of uses might researchers (or companies) put these data to? Does it tend to be mostly technical research, or there also social science applications?

Ben: The intriguing characteristic of the open data concept is that secondary uses can be unpredictable. A re-use is not necessarily technical, even if the raw data has been collected for a purely technical network research. New social science research could be based on existing technical data, or existing research analyses may be falsified or validated by other researchers. Artists, developers, entrepreneurs or public authorities can also use existing data to create new applications or to enrich existing information systems. There have been many instances when open data has been re-used for beneficial or profitable means.

However, there is also a flipside to open data, especially when the dataset contains personal information, or information that can be linked to individuals. A working definition of open data is that one makes entire databases available, in standardized, machine readable and electronic format, to any secondary user, free of charge and free of restrictions or obligations, for any purpose. If a dataset contains information about your Internet browsing habits, your movements throughout the day or the phone numbers you have called over a specific period of time, it could be quite troubling if you have no control over who re-uses this information.

The risks and harms of such re-use are very context dependent, of course. In the Western world, such data could be used as means for blackmail, stalking, identity theft, unsolicited commercial communications, etc. Further, if there is a chance our telecom operators just share data on how we use our mobile phones, we may refrain from activities, such as taking part in demonstrations, attending political gatherings, or accessing certain socially unacceptable information. Such self-censorship will damage the free society we expect. In the developing world, or in authoritarian regimes, risks and harms can be a matter of life and death for data subjects, or at least involve the risk of physical harm. This is true for all citizens, but also diplomats, aid workers and journalists or social media users.

Finally, we cannot envisage how political contexts will change in the future. Future malevolent governments, even in Europe or the US, could easily use datasets containing sensitive information to harm or control specific groups of society. One only need look at the changing political landscape in Hungary to see how specific groups are suddenly targeted in what we thought was becoming a country that adheres to Western values.

Ed: The ethical privacy guidelines note the basic relation between the level of detail in information collected and the resulting usefulness of the dataset (datasets becoming less powerful as subjects are increasingly de-identified). This seems a fairly intuitive and fundamentally unavoidable problem; is there anything in particular to say about it?

Ben: Research often requires rich datasets for worthwhile analyses to be conducted. These will inevitably sometimes contain personal information, as it can be important to relate specific data to data subjects, whether anonymised, pseudonymised or otherwise. Far reaching deletion, aggregation or randomisation of data can make the dataset useless for the research purposes.

Sophisticated methods of re-identifying datasets, and unforeseen methods which will be developed in future, mean that much information must be deleted or aggregated in order for a dataset containing personal information to be truly anonymous. It has become very difficult to determine when a dataset is sufficiently anonymised to the extent that it can enjoy the legal exception offered by data protection laws around the world and therefore be distributed as open data, without legal restrictions.

As a result, many research datasets cannot simply be released. The guidelines do not force the researcher to a zero-risk situation, where only useless or meaningless datasets can be released. The guidelines force the researcher to think very carefully about the type of data that will be collected, about data processing techniques and different disclosure methods. Although open data is an attractive method of disseminating research data, sometimes managed access systems may be more appropriate. The guidelines constantly trigger the researcher to consider the risks to data subjects in their specific context during each stage of the research design. They serve as a guide, but also a normative framework for research that is potentially privacy invasive.

Ed: Presumably mobile companies have a duty to delete their data after a certain period; does this conflict with open datasets, whose aim is to be available indefinitely?

Ben: It is not a requirement for open data to be available indefinitely. However, once information is published freely on the Internet, it is very hard – if not impossible – to delete it. The researcher loses all control over a dataset once it is published online. So, if a dataset is sufficiently de-identified for the re-identification techniques that are known today, this does not mean that future techniques cannot re-identify the dataset. We can’t expect researchers to take into account all science-fiction type future developments, but the guidelines to force the researcher to consider what successful re-identification would reveal about data subjects.

European mobile phone companies do have a duty to keep logs of communications for 6 months to 2 years, depending on the implication of the misguided data retention directive. We have recently learned that intelligence services worldwide have more or less unrestricted access to such information. We have no idea how long this information is stored in practice. Recently it has been frequently been stated that deleting data has become more expensive than just keeping it. This means that mobile phone operators and intelligence agencies may keep data on our mobile phone use forever. This must be taken into account when assessing which auxiliary datasets could be used to re-identify a research dataset. An IP-address could be sufficient to link much information to an individual.

Ed: Presumably it’s impossible for a subject to later decide they want to be taken out of an open dataset; firstly due to cost, but also because (by definition) it ought to be impossible to find them in an anonymised dataset. Does this present any practical or legal problems?

Ben: In some countries, especially in Europe, data subjects have a legal right to object to their data being processed, by withdrawing consent or engaging in a legal procedure with the data processor. Although this is an important right, exercising it may lead to undesirable consequences for research. For example, the underlying dataset will be incomplete for secondary researchers who want to validate findings.

Our guidelines encourage researchers to be transparent about their research design, data processing and foreseeable secondary uses of the data. On the one hand, this builds trust in the network research discipline. On the other, it gives data subjects the necessary information to feel confident to share their data. Still, data subjects should be able to retract their consent via electronic means, instead of sending letters, if they can substantiate an appreciable harm to them.

Ed: How aware are funding bodies and ethics boards of the particular problems presented by mobile research; and are they categorically different from other human-subject research data? (eg interviews / social network data / genetic studies etc.)

Ben: University ethical boards or funding bodies are be staffed by experts in a wide range of disciplines. However, this does not mean they understand the intricate details of complex Internet measurements, de-identification techniques or the state of affairs with regards to re-identification techniques, nor the harms a research programme can inflict given a specific context. For example, not everyone’s intuitive moral privacy compass will be activated when they read in a research proposal that the research systems will “monitor routing dynamics, by analysing packet traces collected from cell towers and internet exchanges”, or similar sentences.

Our guidelines encourage the researcher to write up the choices made with regards to personal information in a manner that is clear and understandable for the layperson. Such a level of transparency is useful for data subjects —  as well as ethical boards and funding bodies — to understand exactly what the research entails and how risks have been accommodated.

Ed: Linnet Taylor has already discussed mobile data mining from regions of the world with weak privacy laws: what is the general status of mobile privacy legislation worldwide?

Ben: Privacy legislation itself is about as fragmented and disputed as it gets. The US generally treats personal information as a commodity that can be traded, which enables Internet companies in Silicon Valley to use data as the new raw material in the information age. Europe considers privacy and data protection as a fundamental right, which is currently regulated in detail, albeit based on a law from 1995. The review of European data protection regulation has been postponed to 2015, possibly as a result of the intense lobbying effort in Brussels to either weaken or strengthen the proposed law. Some countries have not regulated privacy or data protection at all. Other countries have a fundamental right to privacy, which is not further developed in a specific data protection law and thus hardly enforced. Another group of countries have transplanted the European approach, but do not have the legal expertise to apply the 1995 law to the digital environment. The future of data protection is very much up in the air and requires much careful study.

The guidelines we have publishing take the international human rights framework as a base, while drawing inspiration from several existing legal concepts such as data minimisation, purpose limitation, privacy by design and informed consent. The guidelines give a solid base for privacy aware research design. We do encourage researchers to discuss their projects with colleagues and legal experts as much as possible, though, because best practices and legal subtleties can vary per country, state or region.

Read the guidelines: Zevenbergen, B., Brown,I., Wright, J., and Erdos, D. (2013) Ethical Privacy Guidelines for Mobile Connectivity Measurements. Oxford Internet Institute, University of Oxford.

Ben Zevenbergen was talking to blog editor David Sutcliffe.

The scramble for Africa’s data

Mobile phone advert in Zomba, Malawi
Africa is in the midst of a technological revolution, and the current wave of digitisation has the potential to make the continent’s citizens a rich mine of data. Intersection in Zomba, Malawi. Image by john.duffell.


After the last decade’s exponential rise in ICT use, Africa is fast becoming a source of big data. Africans are increasingly emitting digital information with their mobile phone calls, internet use and various forms of digitised transactions, while on a state level e-government starts to become a reality. As Africa goes digital, the challenge for policymakers becomes what the WRR, a Dutch policy organisation, has identified as ‘i-government’: moving from digitisation to managing and curating digital data in ways that keep people’s identities and activities secure.

On one level, this is an important development for African policymakers, given that accurate information on their populations has been notoriously hard to come by and, where it exists, has not been shared. On another, however, it represents a tremendous challenge. The WRR has pointed out the unpreparedness of European governments, who have been digitising for decades, for the age of i-government. How are African policymakers, as relative newcomers to digital data, supposed to respond?

There are two possible scenarios. One is that systems will develop for the release and curation of Africans’ data by corporations and governments, and that it will become possible, in the words of the UN’s Global Pulse initiative, to use it as a ‘public good’ – an invaluable tool for development policies and crisis response. The other is that there will be a new scramble for Africa: a digital resource grab that may have implications as great as the original scramble amongst the colonial powers in the late 19th century.

We know that African data is not only valuable to Africans. The current wave of digitisation has the potential to make the continent’s citizens a rich mine of data about health interventions, human mobility, conflict and violence, technology adoption, communication dynamics and financial behaviour, with the default mode being for this to happen without their consent or involvement, and without ethical and normative frameworks to ensure data protection or to weigh the risks against the benefits. Orange’s recent release of call data from Cote d’Ivoire both represents an example of the emerging potential of African digital data, but also the challenge of understanding the kind of anonymisation and ethical challenge that it represents.

I have heard various arguments as to why data protection is not a problem for Africans. One is that people in African countries don’t care about their privacy because they live in a ‘collective society’. (Whatever that means.) Another is that they don’t yet have any privacy to protect because they are still disconnected from the kinds of system that make data privacy important. Another more convincing and evidence-based argument is that the ends may justify the means (as made here by the ICRC in a thoughtful post by Patrick Meier about data privacy in crisis situations), and that if significant benefits can be delivered using African big data these outweigh potential or future threats to privacy. The same argument is being made by Global Pulse, a UN initiative which aims to convince corporations to release data on developing countries as a public good for use in devising development interventions.

There are three main questions: what can incentivise African countries’ citizens and policymakers to address privacy in parallel with the collection of massive amounts of personal data, rather than after abuses occur? What are the models that might be useful in devising privacy frameworks for groups with restricted technological access and sophistication? And finally, how can such a system be participatory enough to be relevant to the needs of particular countries or populations?

Regarding the first question, this may be a lost cause. The WRR’s i-government work suggests that only public pressure due to highly publicised breaches of data security may spur policymakers to act. The answer to the second question is being pursued, among others, by John Clippinger and Alex Pentland at MIT (with their work on the social stack); by the World Economic Forum, which is thinking about the kinds of rules that should govern personal data worldwide; by the aforementioned Global Pulse, which has a strong interest in building frameworks which make it safe for corporations to share people’s data; by Microsoft, which is doing some serious thinking about differential privacy for large datasets; by independent researchers such as Patrick Meier, who is looking at how crowdsourced data about crises and human rights abuses should be handled; and by the Oxford Internet Institute’s new M-Data project which is devising privacy guidelines for collecting and using mobile connectivity data.

Regarding the last question, participatory systems will require African country activists, scientists and policymakers to build them. To be relevant, they will also need to be made enforceable, which may be an even greater challenge. Privacy frameworks are only useful if they are made a living part of both governance and citizenship: there must be the institutional power to hold offenders accountable (in this case extremely large and powerful corporations, governments and international institutions), and awareness amongst ordinary people about the existence and use of their data. This, of course, has not really been achieved in developed countries, so doing it in Africa may not exactly be a piece of cake.

Notwithstanding these challenges, the region offers an opportunity to push researchers and policymakers – local and worldwide – to think clearly about the risks and benefits of big data, and to make solutions workable, enforceable and accessible. In terms of data privacy, if it works in Burkina Faso, it will probably work in New York, but the reverse is unlikely to be true. This makes a strong argument for figuring it out in Burkina Faso.

Some may contend that this discussion only points out the massive holes in the governance of technology that prevail in Africa – and in fact a whole other level of problems regarding accountability and power asymmetries. My response: Yes. Absolutely.

Linnet Taylor’s research focuses on social and economic aspects of the diffusion of the internet in Africa, and human mobility as a factor in technology adoption (.. read her blog). Her doctoral research was on Ghana, where she looked at mobility’s influence on the formation and viability of internet cafes in poor and remote areas, networking amongst Ghanaian technology professionals and ICT4D policy. At the OII she works on a Sloan Foundation funded project on Accessing and Using Big Data to Advance Social Science Knowledge.