What are the barriers to big data analytics in local government?

by David Sutcliffe 28/06/2017

Examining the extent to which local governments in the UK are using intelligence from big data, in light of the structural barriers they face when trying to exploit it.

by David Sutcliffe 28/06/2017

Many local governments have reams of data (both hard data and soft data) on local inhabitants and local businesses. Image: Chris Dawkins (Flickr CC BY-NC-ND 2.0).

The concept of Big Data has become very popular over the last decade, with many large technology companies successfully building their business models around its exploitation. The UK’s public sector has tried to follow suit, with local governments in particular trying to introduce new models of service delivery based on the routine extraction of information from their own big data. These attempts have been hailed as the beginning of a new era for the public sector, with some commentators suggesting that it could help local governments transition toward a model of service delivery where the quantity and quality of commissioned services is underpinned by data intelligence on users and their current and future needs. In their Policy & Internet article “Data Intelligence for Local Government? Assessing the Benefits and Barriers to Use of Big Data in the Public Sector”, Fola Malomo and Vania Sena examine the extent to which local governments in the UK are indeed using intelligence from big data, in light of the structural barriers they face when trying to exploit it. Their analysis suggests that the ambitions around the development of big data capabilities in local government are not reflected in actual use. Indeed, these methods have mostly been employed to develop new digital channels for service delivery, and even if the financial benefits of these initiatives are documented, very little is known about the benefits generated by them for the local communities. While this is slowly changing as councils start to develop their big data capability, the overall impression gained from even a cursory overview is that the full potential of big data is yet to be exploited. We caught up with the authors to discuss their findings: Ed.: So what actually is “the full potential” that local government is supposed to be aiming for? What exactly is the promise of “big data” in this context? Fola / Vania: Local governments seek to improve service delivery…

Ethics, Politics & Government, Social Data Science

Alan Turing Institute and OII: Summit on Data Science for Government and Policy Making

by Helen Margetts 31/05/2016

Leading policy makers, data scientists and academics came together to discuss how the ATI and government could work together to develop data science for the public good.

by Helen Margetts 31/05/2016

The benefits of big data and data science for the private sector are well recognised. So far, considerably less attention has been paid to the power and potential of the growing field of data science for policy-making and public services. On Monday 14th March 2016 the Oxford Internet Institute (OII) and the Alan Turing Institute (ATI) hosted a Summit on Data Science for Government and Policy Making, funded by the EPSRC. Leading policy makers, data scientists and academics came together to discuss how the ATI and government could work together to develop data science for the public good. The convenors of the Summit, Professors Helen Margetts (OII) and Tom Melham (Computer Science), report on the day’s proceedings. The Alan Turing Institute will build on the UK’s existing academic strengths in the analysis and application of big data and algorithm research to place the UK at the forefront of world-wide research in data science. The University of Oxford is one of five university partners, and the OII is the only partnering department in the social sciences. The aim of the summit on Data Science for Government and Policy-Making was to understand how government can make better use of big data and the ATI—with the academic partners in listening mode. We hoped that the participants would bring forward their own stories, hopes and fears regarding data science for the public good. Crucially, we wanted to work out a roadmap for how different stakeholders can work together on the distinct challenges facing government, as opposed to commercial organisations. At the same time, data science research and development has much to gain from the policy-making community. Some of the things that government does—collect tax from the whole population, or give money away at scale, or possess the legitimate use of force—it does by virtue of being government. So the sources of data and some of the data science challenges that public agencies face are…

Collective Action, Democracy, Ethics, Governance & Security, Politics & Government

Exploring the Ethics of Monitoring Online Extremism

by Bertram Vidgen 23/03/2016

Exploring the complexities of policing the web for extremist material, and its implications for security, privacy and human rights.

by Bertram Vidgen 23/03/2016

(Part 2 of 2) The Internet serves not only as a breeding ground for extremism, but also offers myriad data streams which potentially hold great value to law enforcement. The report by the OII’s Ian Brown and Josh Cowls for the VOX-Pol project: Check the Web: Assessing the Ethics and Politics of Policing the Internet for Extremist Material explores the complexities of policing the web for extremist material, and its implications for security, privacy and human rights. In the second of a two-part post, Josh Cowls and Ian Brown discuss the report with blog editor Bertie Vidgen. Read the first post. Ed: Josh, political science has long posed a distinction between public spaces and private ones. Yet it seems like many platforms on the Internet, such as Facebook, cannot really be categorised in such terms. If this correct, what does it mean for how we should police and govern the Internet? Josh: I think that is right—many online spaces are neither public nor private. This is also an issue for some for privacy legal frameworks (especially in the US). A lot of the covenants and agreements were written forty or fifty years ago, long before anyone had really thought about the Internet. That has now forced governments, societies and parliaments to adapt these existing rights and protocols for the online sphere. I think that we have some fairly clear laws about the use of human intelligence sources, and police law in the offline sphere. The interesting question is how we can take that online. How can the pre-existing standards, like the requirement that procedures are necessary and proportionate, or the ‘right to appeal’, be incorporated into online spaces? In some cases there are direct analogies. In other cases there needs to be some re-writing of the rule book to try figure out what we mean. And, of course, it is difficult because the internet itself is always changing! Ed: So…

Methods, Social Data Science, Tools

P-values are widely used in the social sciences, but often misunderstood: and that’s a problem.

by taha yasseri 07/03/2016

We need to make standards for interpreting p-values more stringent, and also improve transparency in the academic reporting process.

by taha yasseri 07/03/2016

P-values are widely used in the social sciences, especially ‘big data’ studies, to calculate statistical significance. Yet they are widely criticised for being easily hacked, and for not telling us what we want to know. Many have argued that, as a result, research is wrong far more often than we realize. In their recent article P-values: Misunderstood and Misused OII Research Fellow Taha Yasseri and doctoral student Bertie Vidgen argue that we need to make standards for interpreting p-values more stringent, and also improve transparency in the academic reporting process, if we are to maximise the value of statistical analysis. “Significant”: an illustration of selective reporting andstatistical significance from XKCD. Available online athttp://xkcd.com/882/ In an unprecedented move, the American Statistical Association recently released a statement (March 7 2016) warning against how p-values are currently used. This reflects a growing concern in academic circles that whilst a lot of attention is paid to the huge impact of big data and algorithmic decision-making, there is considerably less focus on the crucial role played by statistics in enabling effective analysis of big data sets, and making sense of the complex relationships contained within them. Because much as datafication has created huge social opportunities, it has also brought to the fore many problems and limitations with current statistical practices. In particular, the deluge of data has made it crucial that we can work out whether studies are ‘significant’. In our paper, published three days before the ASA’s statement, we argued that the most commonly used tool in the social sciences for calculating significance—the p-value—is misused, misunderstood and, most importantly, doesn’t tell us what we want to know. The basic problem of ‘significance’ is simple: it is simply unpractical to repeat an experiment an infinite number of times to make sure that what we observe is “universal”. The same applies to our sample size: we are often unable to analyse a “whole population” sample and so have to…

Governance & Security

New Voluntary Code: Guidance for Sharing Data Between Organisations

by Alison Holt 08/01/2016

For data sharing between organisations to be straight forward, there needs to a common understanding of basic policy and practice.

by Alison Holt 08/01/2016

Many organisations are coming up with their own internal policy and guidelines for data sharing. However, for data sharing between organisations to be straight forward, there needs to a common understanding of basic policy and practice. During her time as an OII Visiting Associate, Alison Holt developed a pragmatic solution in the form of a Voluntary Code, anchored in the developing ISO standards for the Governance of Data. She discusses the voluntary code, and the need to provide urgent advice to organisations struggling with policy for sharing data. Collecting, storing and distributing digital data is significantly easier and cheaper now than ever before, in line with predictions from Moore, Kryder and Gilder. Organisations are incentivised to collect large volumes of data with the hope of unleashing new business opportunities or maybe even new businesses. Consider the likes of Uber, Netflix, and Airbnb and the other data mongers who have built services based solely on digital assets. The use of this new abundant data will continue to disrupt traditional business models for years to come, and there is no doubt that these large data volumes can provide value. However, they also bring associated risks (such as unplanned disclosure and hacks) and they come with constraints (for example in the form of privacy or data protection legislation). Hardly a week goes by without a data breach hitting the headlines. Even if your telecommunications provider didn’t inadvertently share your bank account and sort code with hackers, and your child wasn’t one of the hundreds of thousands of children whose birthdays, names, and photos were exposed by a smart toy company, you might still be wondering exactly how your data is being looked after by the banks, schools, clinics, utility companies, local authorities and government departments that are so quick to collect your digital details. Then there are the companies who have invited you to sign away the rights to your data and possibly your…

Ethics, Governance & Security

Government “only” retaining online metadata still presents a privacy risk

by Brent Mittelstadt 30/11/2015

When considered in relation to individuals’ privacy, metadata should not be viewed as fundamentally different to data about the content of a communication.

by Brent Mittelstadt 30/11/2015

Since 13 October 2015 telecommunications providers in Australia have been required to retain metadata on communications for two years. Image by h2hox (Flickr)

Issues around data capture, retention and control are gaining significant attention in many Western countries—including in the UK. In this piece originally posted on the Ethics Centre Blog, the OII’s Brent Mittelstadt considers the implications of metadata retention for privacy. He argues that when considered in relation to individuals’ privacy, metadata should not be viewed as fundamentally different to data about the content of a communication. Australia’s new data retention law for telecommunications providers, comparable to extant UK and US legislation, came into effect 13 October 2015. Telecoms and ISPs are now required to retain metadata about communications for two years to assist law enforcement agencies in crime and terrorism investigation. Despite now being in effect, the extent and types of data to be collected remain unclear. The law has been widely criticised for violating Australians’ right to privacy by introducing overly broad surveillance of civilians. The Government has argued against this portrayal. They argue the content of communications will not be retained but rather the “data about the data”—location, time, date and duration of a call. Metadata retention raises complex ethical issues often framed in terms of privacy which are relevant globally. A popular argument is that metadata offers a lower risk of violating privacy compared to primary data—the content of communication. The distinction between the “content” and “nature” of a communication implies that if the content of a message is protected, so is the privacy of the sender and receiver. The assumption that metadata retention is more acceptable because of its lower privacy risks is unfortunately misguided. Sufficient volumes of metadata offer comparable opportunities to generate invasive information about civilians. Consider a hypothetical. I am given access to a mobile carrier’s dataset that specifies time, date, caller and receiver identity in addition to a continuous record of location constructed with telecommunication tower triangulation records. I see from this that when John’s wife Jane leaves the house, John often calls Jill and visits…

Politics & Government, Social Data Science

How big data is breathing new life into the smart cities concept

by Jonathan Bright 23/07/2015

If all the cars have GPS devices, all the people have mobile phones, and all opinions are expressed on social media, then do we really need the city to be smart at all?

by Jonathan Bright 23/07/2015

“Big data” is a growing area of interest for public policy makers: for example, it was highlighted in UK Chancellor George Osborne’s recent budget speech as a major means of improving efficiency in public service delivery. While big data can apply to government at every level, the majority of innovation is currently being driven by local government, especially cities, who perhaps have greater flexibility and room to experiment and who are constantly on a drive to improve service delivery without increasing budgets. Work on big data for cities is increasingly incorporated under the rubric of “smart cities”. The smart city is an old(ish) idea: give urban policymakers real time information on a whole variety of indicators about their city (from traffic and pollution to park usage and waste bin collection) and they will be able to improve decision making and optimise service delivery. But the initial vision, which mostly centred around adding sensors and RFID tags to objects around the city so that they would be able to communicate, has thus far remained unrealised (big up front investment needs and the requirements of IPv6 are perhaps the most obvious reasons for this). The rise of big data—large, heterogeneous datasets generated by the increasing digitisation of social life—has however breathed new life into the smart cities concept. If all the cars have GPS devices, all the people have mobile phones, and all opinions are expressed on social media, then do we really need the city to be smart at all? Instead, policymakers can simply extract what they need from a sea of data which is already around them. And indeed, data from mobile phone operators has already been used for traffic optimisation, Oyster card data has been used to plan London Underground service interruptions, sewage data has been used to estimate population levels, the examples go on. However, at the moment these examples remain largely anecdotal, driven forward by a few…

Politics & Government, Social Data Science

Digital Disconnect: Parties, Pollsters and Political Analysis in #GE2015

by Helen Margetts 11/05/2015

The Oxford Internet Institute undertook some live analysis of social media data over the night of the 2015 UK General Election.

by Helen Margetts 11/05/2015

‘Congratulations to my friend @Messina2012 on his role in the resounding Conservative victory in Britain’ tweeted David Axelrod, campaign advisor to Miliband, to his former colleague Jim Messina, Cameron’s strategy adviser, on May 8th. The former was Obama’s communications director and the latter campaign manager of Obama’s 2012 campaign. Along with other consultants and advisors and large-scale data management platforms from Obama’s hugely successful digital campaigns, Conservative and Labour used an arsenal of social media and digital tools to interact with voters throughout, as did all the parties competing for seats in the 2015 election. The parties ran very different kinds of digital campaigns. The Conservatives used advanced data science techniques borrowed from the US campaigns to understand how their policy announcements were being received and to target groups of individuals. They spent ten times as much as Labour on Facebook, using ads targeted at Facebook users according to their activities on the platform, geo-location and demographics. This was a top down strategy that involved working out was happening on social media and responding with targeted advertising, particularly for marginal seats. It was supplemented by the mainstream media, such as the Telegraph for example, which contacted its database of readers and subscribers to services such as Telegraph Money, urging them to vote Conservative. As Andrew Cooper tweeted after the election, ‘Big data, micro-targeting and social media campaigns just thrashed “5 million conversations” and “community organising”’. He has a point. Labour took a different approach to social media. Widely acknowledged to have the most boots on the real ground, knocking on doors, they took a similar ‘ground war’ approach to social media in local campaigns. Our own analysis at the Oxford Internet Institute shows that of the 450K tweets sent by candidates of the six largest parties in the month leading up to the general election, Labour party candidates sent over 120,000 while the Conservatives sent only 80,000, no more than…

Social Data Science

Tracing our every move: Big data and multi-method research

by Ericka Menchen-Trevino 30/04/2015

There is a lot of excitement about ‘big data’, but the potential for innovative work on social and cultural topics far outstrips current data collection and analysis techniques.

by Ericka Menchen-Trevino 30/04/2015

Using anything digital always creates a trace. The more digital ‘things’ we interact with, from our smart phones to our programmable coffee pots, the more traces we create. When collected together these traces become big data. These collections of traces can become so large that they are difficult to store, access and analyse with today’s hardware and software. But as a social scientist I’m interested in how this kind of information might be able to illuminate something new about societies, communities, and how we interact with one another, rather than engineering challenges. Social scientists are just beginning to grapple with the technical, ethical, and methodological challenges that stand in the way of this promised enlightenment. Most of us are not trained to write database queries or regular expressions, or even to communicate effectively with those who are trained. Ethical questions arise with informed consent when new analytics are created. Even a data scientist could not know the full implications of consenting to data collection that may be analysed with currently unknown techniques. Furthermore, social scientists tend to specialise in a particular type of data and analysis, surveys or experiments and inferential statistics, interviews and discourse analysis, participant observation and ethnomethodology, and so on. Collaborating across these lines is often difficult, particularly between quantitative and qualitative approaches. Researchers in these areas tend to ask different questions and accept different kinds of answers as valid. Yet trace data does not fit into the quantitative/qualitative binary. The trace of a tweet includes textual information, often with links or images and metadata about who sent it, when and sometimes where they were. The traces of web browsing are also largely textual with some audio/visual elements. The quantity of these textual traces often necessitates some kind of initial quantitative filtering, but it doesn’t determine the questions or approach. The challenges are important to understand and address because the promise of new insight into social life…

Health, Social Data Science, Wellbeing

How can big data be used to advance dementia research?

by admin 16/03/2015

As dementia is believed to be influenced by a wide range of social, environmental and lifestyle-related factors, this behavioural data has the potential to improve early diagnosis

by admin 16/03/2015

Image by K. Kendall of "Sights and Scents at the Cloisters: for people with dementia and their care partners"; a program developed in consultation with the Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Alzheimer's Disease Research Center at Columbia University, and the Alzheimer's Association.

Dementia affects about 44 million individuals, a number that is expected to nearly double by 2030 and triple by 2050. With an estimated annual cost of USD 604 billion, dementia represents a major economic burden for both industrial and developing countries, as well as a significant physical and emotional burden on individuals, family members and caregivers. There is currently no cure for dementia or a reliable way to slow its progress, and the G8 health ministers have set the goal of finding a cure or disease-modifying therapy by 2025. However, the underlying mechanisms are complex, and influenced by a range of genetic and environmental influences that may have no immediately apparent connection to brain health. Of course medical research relies on access to large amounts of data, including clinical, genetic and imaging datasets. Making these widely available across research groups helps reduce data collection efforts, increases the statistical power of studies and makes data accessible to more researchers. This is particularly important from a global perspective: Swedish researchers say, for example, that they are sitting on a goldmine of excellent longitudinal and linked data on a variety of medical conditions including dementia, but that they have too few researchers to exploit its potential. Other countries will have many researchers, and less data. ‘Big data’ adds new sources of data and ways of analysing them to the repertoire of traditional medical research data. This can include (non-medical) data from online patient platforms, shop loyalty cards, and mobile phones — made available, for example, through Apple’s ResearchKit, just announced last week. As dementia is believed to be influenced by a wide range of social, environmental and lifestyle-related factors (such as diet, smoking, fitness training, and people’s social networks), and this behavioural data has the potential to improve early diagnosis, as well as allow retrospective insights into events in the years leading up to a diagnosis. For example, data on changes in shopping…