Should there be a better accounting of the algorithms that choose our news for us?

A central ideal of democracy is that political discourse should allow a fair and critical exchange of ideas and values. But political discourse is unavoidably mediated by the mechanisms and technologies we use to communicate and receive information — and content personalization systems (think search engines, social media feeds and targeted advertising), and the algorithms they rely upon, create a new type of curated media that can undermine the fairness and quality of political discourse.

A new article by Brent Mittlestadt explores the challenges of enforcing a political right to transparency in content personalization systems. Firstly, he explains the value of transparency to political discourse and suggests how content personalization systems undermine open exchange of ideas and evidence among participants: at a minimum, personalization systems can undermine political discourse by curbing the diversity of ideas that participants encounter. Second, he explores work on the detection of discrimination in algorithmic decision making, including techniques of algorithmic auditing that service providers can employ to detect political bias. Third, he identifies several factors that inhibit auditing and thus indicate reasonable limitations on the ethical duties incurred by service providers — content personalization systems can function opaquely and be resistant to auditing because of poor accessibility and interpretability of decision-making frameworks. Finally, Brent concludes with reflections on the need for regulation of content personalization systems.

He notes that no matter how auditing is pursued, standards to detect evidence of political bias in personalized content are urgently required. Methods are needed to routinely and consistently assign political value labels to content delivered by personalization systems. This is perhaps the most pressing area for future work—to develop practical methods for algorithmic auditing.

The right to transparency in political discourse may seem unusual and farfetched. However, standards already set by the U.S. Federal Communication Commission’s fairness doctrine — no longer in force — and the British Broadcasting Corporation’s fairness principle both demonstrate the importance of the idealized version of political discourse described here. Both precedents promote balance in public political discourse by setting standards for delivery of politically relevant content. Whether it is appropriate to hold service providers that use content personalization systems to a similar standard remains a crucial question.

Read the full article: Mittelstadt, B. (2016) Auditing for Transparency in Content Personalization Systems. International Journal of Communication 10(2016), 4991–5002.

We caught up with Brent to explore the broader implications of the study:

Ed: We basically accept that the tabloids will be filled with gross bias, populism and lies (in order to sell copy) — and editorial decisions are not generally transparent to us. In terms of their impact on the democratic process, what is the difference between the editorial boardroom and a personalising social media algorithm?

Brent: There are a number of differences. First, although not necessarily transparent to the public, one hopes that editorial boardrooms are at least transparent to those within the news organisations. Editors can discuss and debate the tone and factual accuracy of their stories, explain their reasoning to one another, reflect upon the impact of their decisions on their readers, and generally have a fair debate about the merits and weaknesses of particular content.

This is not the case for a personalising social media algorithm; those working with the algorithm inside a social media company are often unable to explain why the algorithm is functioning in a particular way, or determined a particular story or topic to be ‘trending’ or displayed to particular users, while others are not. It is also far more difficult to ‘fact check’ algorithmically curated news; a news item can be widely disseminated merely by many users posting or interacting with it, without any purposeful dissemination or fact checking by the platform provider.

Another big difference is the degree to which users can be aware of the bias of the stories they are reading. Whereas a reader of The Daily Mail or The Guardian will have some idea of the values of the paper, the same cannot be said of platforms offering algorithmically curated news and information. The platform can be neutral insofar as it disseminates news items and information reflecting a range of values and political viewpoints. A user will encounter items reflecting her particular values (or, more accurately, her history of interactions with the platform and the values inferred from them), but these values, and their impact on her exposure to alternative viewpoints, may not be apparent to the user.

Ed: And how is content “personalisation” different to content filtering (e.g. as we see with the Great Firewall of China) that people get very worked up about? Should we be more worried about personalisation?

Brent: Personalisation and filtering are essentially the same mechanism; information is tailored to a user or users according to some prevailing criteria. One difference is whether content is merely infeasible to access, or technically inaccessible. Content of all types will typically still be accessible in principle when personalisation is used, but the user will have to make an effort to access content that is not recommended or otherwise given special attention. Filtering systems, in contrast, will impose technical measures to make particular content inaccessible from a particular device or geographical area.

Another difference is the source of the criteria used to set the visibility of different types of content. In the case of personalisation, these criteria are typically based on the users (inferred) interests, values, past behaviours and explicit requests. Critically, these values are not necessarily apparent to the user. For filtering, criteria are typically externally determined by a third party, often a government. Some types of information are set off limits, according to the prevailing values of the third party. It is the imposition of external values, which limit the capacity of users to access content of their choosing, which often causes an outcry against filtering and censorship.

Importantly, the two mechanisms do not necessarily differ in terms of the transparency of the limiting factors or rules to users. In some cases, such as the recently proposed ban in the UK of adult websites that do not provide meaningful age verification mechanisms, the criteria that determine whether sites are off limits will be publicly known at a general level. In other cases, and especially with personalisation, the user inside the ‘filter bubble’ will be unaware of the rules that determine whether content is (in)accessible. And it is not always the case that the platform provider intentionally keeps these rules secret. Rather, the personalisation algorithms and background analytics that determine the rules can be too complex, inaccessible or poorly understood even by the provider to give the user any meaningful insight.

Ed: Where are these algorithms developed: are they basically all proprietary? i.e. how would you gain oversight of massively valuable and commercially sensitive intellectual property?

Brent: Personalisation algorithms tend to be proprietary, and thus are not normally open to public scrutiny in any meaningful sense. In one sense this is understandable; personalisation algorithms are valuable intellectual property. At the same time the lack of transparency is a problem, as personalisation fundamentally affects how users encounter and digest information on any number of topics. As recently argued, it may be the case that personalisation of news impacts on political and democratic processes. Existing regulatory mechanisms have not been successful in opening up the ‘black box’ so to speak.

It can be argued, however, that legal requirements should be adopted to require these algorithms to be open to public scrutiny due to the fundamental way they shape our consumption of news and information. Oversight can take a number of forms. As I argue in the article, algorithmic auditing is one promising route, performed both internally by the companies themselves, and externally by a government agency or researchers. A good starting point would be for the companies developing and deploying these algorithms to extend their cooperation with researchers, thereby allowing a third party to examine the effects these systems are having on political discourse, and society more broadly.

Ed: By “algorithm audit” — do you mean examining the code and inferring what the outcome might be in terms of bias, or checking the outcome (presumably statistically) and inferring that the algorithm must be introducing bias somewhere? And is it even possible to meaningfully audit personalisation algorithms, when they might rely on vast amounts of unpredictable user feedback to train the system?

Brent: Algorithm auditing can mean both of these things, and more. Audit studies are a tool already in use, whereby human participants introduce different inputs into a system, and examine the effect on the system’s outputs. Similar methods have long been used to detect discriminatory hiring practices, for instance. Code audits are another possibility, but are generally prohibitive due to problems of access and complexity. Also, even if you can access and understand the code of an algorithm, that tells you little about how the algorithm performs in practice when given certain input data. Both the algorithm and input data would need to be audited.

Alternatively, auditing can assess just the outputs of the algorithm; recent work to design mechanisms to detect disparate impact and discrimination, particularly in the Fairness, Accountability and Transparency in Machine Learning (FAT-ML) community, is a great example of this type of auditing. Algorithms can also be designed to attempt to prevent or detect discrimination and other harms as they occur. These methods are as much about the operation of the algorithm, as they are about the nature of the training and input data, which may itself be biased. In short, auditing is very difficult, but there are promising avenues of research and development. Once we have reliable auditing methods, the next major challenge will be to tailor them to specific sectors; a one-size-meets-all approach to auditing is not on the cards.

Ed: Do you think this is a real problem for our democracy? And what is the solution if so?

Brent: It’s difficult to say, in part because access and data to study the effects of personalisation systems are hard to come by. It is one thing to prove that personalisation is occurring on a particular platform, or to show that users are systematically displayed content reflecting a narrow range of values or interests. It is quite another to prove that these effects are having an overall harmful effect on democracy. Digesting information is one of the most basic elements of social and political life, so any mechanism that fundamentally changes how information is encountered should be subject to serious and sustained scrutiny.

Assuming personalisation actually harms democracy or political discourse, mitigating its effects is quite a different issue. Transparency is often treated as the solution, but merely opening up algorithms to public and individual scrutiny will not in itself solve the problem. Information about the functionality and effects of personalisation must be meaningful to users if anything is going to be accomplished.

At a minimum, users of personalisation systems should be given more information about their blind spots, about the types of information they are not seeing, or where they lie on the map of values or criteria used by the system to tailor content to users. A promising step would be proactively giving the user some idea of what the system thinks it knows about them, or how they are being classified or profiled, without the user first needing to ask.

Brent Mittelstadt was talking to blog editor David Sutcliffe.

Monitoring Internet openness and rights: report from the Citizen Lab Summer Institute 2014

Jon Penny presenting on the US experience of Internet-related corporate transparency reporting.

根据相关法律法规和政策,部分搜索结果未予显示 could be a warning message we will see displayed more often on the Internet; but likely translations thereof. In Chinese, this means “according to the relevant laws, regulations, and policies, a portion of search results have not been displayed.” The control of information flows on the Internet is becoming more commonplace, in authoritarian regimes as well as in liberal democracies, either via technical or regulatory means. Such information controls can be defined as “[…] actions conducted in or through information and communications technologies (ICTs), which seek to deny (such as web filtering), disrupt (such as denial-of-service attacks), shape (such as throttling), secure (such as through encryption or circumvention) or monitor (such as passive or targeted surveillance) information for political ends. Information controls can also be non-technical and can be implemented through legal and regulatory frameworks, including informal pressures placed on private companies. […]” Information controls are not intrinsically good or bad, but much is to be explored and analysed about their use, for political or commercial purposes.

The University of Toronto’s Citizen Lab organised a one-week summer institute titled “Monitoring Internet Openness and Rights” to inform the global discussions on information control research and practice in the fields of censorship, circumvention, surveillance and adherence to human rights. A week full of presentations and workshops on the intersection of technical tools, social science research, ethical and legal reflections and policy implications was attended by a distinguished group of about 60 community members, amongst whom were two OII DPhil students; Jon Penney and Ben Zevenbergen. Conducting Internet measurements may be considered to be a terra incognita in terms of methodology and data collection, but the relevance and impacts for Internet policy-making, geopolitics or network management are obvious and undisputed.

The Citizen Lab prides itself in being a “hacker hothouse”, or an “intelligence agency for civil society” where security expertise, politics, and ethics intersect. Their research adds the much-needed geopolitical angle to the deeply technical and quantitative Internet measurements they conduct on information networks worldwide. While the Internet is fast becoming the backbone of our modern societies in many positive and welcome ways, abundant (intentional) security vulnerabilities, the ease with which human rights such as privacy and freedom of speech can be violated, threats to the neutrality of the network and the extent of mass surveillance threaten to compromise the potential of our global information sphere. Threats to a free and open internet need to be uncovered and explained to policymakers, in order encourage informed, evidence-based policy decisions, especially in a time when the underlying technology is not well-understood by decision makers.

Participants at the summer institute came with the intent to make sense of Internet measurements and information controls, as well as their social, political and ethical impacts. Through discussions in larger and smaller groups throughout the Munk School of Global Affairs – as well as restaurants and bars around Toronto – the current state of the information controls, their regulation and deployment became clear, and multi-disciplinary projects to measure breaches of human rights on the Internet or its fundamental principles were devised and coordinated.

The outcomes of the week in Toronto are impressive. The OII DPhil students presented their recent work on transparency reporting and ethical data collection in Internet measurement.

Jon Penney gave a talk on “the United States experience” with Internet-related corporate transparency reporting, that is, the evolution of existing American corporate practices in publishing “transparency reports” about the nature and quantity of government and law enforcement requests for Internet user data or content removal. Jon first began working on transparency issues as a Google Policy Fellow with the Citizen Lab in 2011, and his work has continued during his time at Harvard’s Berkman Center for Internet and Society. In this talk, Jon argued that in the U.S., corporate transparency reporting largely began with the leadership of Google and a few other Silicon Valley tech companies like Twitter, but in the Post-Snowden era, has been adopted by a wider cross section of not only technology companies, but also established telecommunications companies like Verizon and AT&T previously resistant to greater transparency in this space (perhaps due to closer, longer term relationships with federal agencies than Silicon Valley companies). Jon also canvassed evolving legal and regulatory challenges facing U.S. transparency reporting and means by which companies may provide some measure of transparency— via tools like warrant canaries— in the face of increasingly complex national security laws.

Ben Zevenbergen has recently launched ethical guidelines for the protection of privacy with regards to Internet measurements conducted via mobile phones. The first panel of the week on “Network Measurement and Information Controls” called explicitly for more concrete ethical and legal guidelines for Internet measurement projects, because the extent of data collection necessarily entails that much personal data is collected and analyzed. In the second panel on “Mobile Security and Privacy”, Ben explained how his guidelines form a privacy impact assessment for a privacy-by-design approach to mobile network measurements. The iterative process of designing a research in close cooperation with colleagues, possibly from different disciplines, ensures that privacy is taken into account at all stages of the project development. His talk led to two connected and well-attended sessions during the week to discuss the ethics of information controls research and Internet measurements. A mailing list has been set up for engineers, programmers, activists, lawyers and ethicists to discuss the ethical and legal aspects of Internet measurements. A data collection has begun to create a taxonomy of ethical issues in the discipline to inform forthcoming peer-reviewed papers.

The Citizen Lab will host its final summer institute of the series in 2015.

Ben Zevenbergen discusses ethical guidelines for Internet measurements conducted via mobile phones.

Photo credits: Ben Zevenbergen, Jon Penney. Writing Credits: Ben Zevenbergen, with small contribution from Jon Penney.

Ben Zevenbergen is an OII DPhil student and Research Assistant working on the EU Internet Science project. He has worked on legal, political and policy aspects of the information society for several years. Most recently he was a policy advisor to an MEP in the European Parliament, working on Europe’s Digital Agenda.

Jon Penney is a legal academic, doctoral student at the Oxford Internet Institute, and a Research Fellow / Affiliate of both The Citizen Lab an interdisciplinary research lab specializing in digital media, cyber-security, and human rights, at the University of Toronto’s Munk School for Global Affairs, and at the Berkman Center for Internet & Society, Harvard University.