Assessing crowdsourcing technologies to collect public opinion around an urban renovation project

Ed: Given the “crisis in democratic accountability”, methods to increase citizen participation are in demand. To this end, your team developed some interactive crowdsourcing technologies to collect public opinion around an urban renovation project in Oulu, Finland. What form did the consultation take, and how did you assess its impact?

Simo: Over the years we’ve deployed various types of interactive interfaces on a network of public displays. In this case it was basically a network of interactive screens deployed in downtown Oulu, next to where a renovation project was happening that we wanted to collect feedback about. We deployed an app on the screens, that allowed people to type feedback directly on the screens (on-screen soft keyboard), and submit feedback to city authorities via SMS, Twitter and email. We also had a smiley-based “rating” system there, which people could us to leave quick feedback about certain aspects of the renovation project.

We ourselves could not, and did not even want to, assess the impact—that’s why we did this in partnership with the city authorities. Then, together with the city folks we could better evaluate if what we were doing had any real-world value whatsoever. And, as we discuss, in the end it did!

Ed: How did you go about encouraging citizens to engage with touch screen technologies in a public space—particularly the non-digitally literate, or maybe people who are just a bit shy about participating?

Simo: Actually, the whole point was that we did not deliberately encourage them by advertising the deployment or by “forcing” anyone to use it. Quite to the contrary: we wanted to see if people voluntarily used it, and the technologies that are an integral part of the city itself. This is kind of the future vision of urban computing, anyway. The screens had been there for years already, and what we wanted to see is if people find this type of service on their own when exploring the screens, and if they take the opportunity to then give feedback using them. The screens hosted a variety of other applications as well: games, news, etc., so it was interesting to also gauge how appealing the idea of public civic feedback is in comparison to everything else that was being offered.

Ed: You mention that using SMS to provide citizen feedback was effective in filtering out noise since it required a minimal payment from citizens—but it also created an initial barrier to participation. How do you increase the quality of feedback without placing citizens on different-level playing fields from the outset—particularly where technology is concerned?

Simo: Yes, SMS really worked well in lowering the amount of irrelevant commentary and complete nonsense. And it is true that SMS already introduces a cost, and even if the cost is minuscule, it’s still a cost to the citizen—and just voicing one’s opinions should of course be free. So there’s no correct answer here—if the channel is public and publicly accessible to anyone, there will be a lot of noisy input. In such cases moderation is a heavy task, and to this end we have been exploring crowdsourcing as well. We can make the community moderate itself. First, we need to identify the users who are genuinely concerned or interested about the issues being explored, and then funnel those users to moderate the discussion/output. It is a win-win situation—the people who want to get involved are empowered to moderate the commentary from others, for implicit rewards.

Ed: For this experiment on citizen feedback in an urban space, your team assembled the world’s largest public display network, which was available for research purposes 24/7. In deploying this valuable research tool, how did you guarantee the privacy of the participants involved, given that some might not want to be seen submitting very negative comments? (e.g. might a form of social pressure be the cause of relatively low participation in the study?)

Simo: The display network was not built only for this experiment, but we have run hundreds of experiments on it, and have written close to a hundred academic papers about them. So, the overarching research focus, really, is on how we can benefit citizens using the network. Over the years we have been able to systematically study issues such as social pressure, group use, effects of the public space, or, one might say “stage”, etc. And yes, social pressure does affect a lot, and for this allowing people participate via e.g. SMS or email helps a lot. That way the users won’t be seen sending the input directly.

Group use is another thing: in groups people don’t feel pressure from the “outside world” so much and are willing to interact with our applications (such as the one documented in this work), but, again, it affects the feedback quality. Groups don’t necessarily tell the truth as they aim for consensus. So the individual, and very important, opinions may not become heard. Ultimately, this is all just part of the game we must deal with, and the real question becomes how to minimise those negative effects that the public space introduces. The positives are clear: everyone can participate, easily, in the heart of the city, and whenever they want.

Ed: Despite the low participation, you still believe that the experimental results are valuable. What did you learn?

Simo: The question in a way already reveals the first important point: people are just not as interested in these “civic” things as they might claim in interviews and pre-studies. When we deploy a civic feedback prototype as the “only option” on a public gizmo (a display, some kind of new tech piece, etc.), people out of curiosity use it. Now, in our case, we just deploy it “as is”, as part of the city infrastructure for people to use if, and only if, they want to use it. So, the prototype competes for attention against smartphones, other applications on the displays, the cluttered city itself… everything!

When one reads many academic papers on interactive civic engagement prototypes, the assumptions are set very high in the discussion: “we got this much participation in this short time”, etc., but that’s not the entire truth. Leave the thing there for months and see if it still interests people! We have done the same, deployed a prototype for three days, gotten tons of interaction, published it, and learned only afterwards that “oh, maybe we were a bit optimistic with the efficiency” when the use suddenly dropped to minimum. It’s just not that easy and the application require frequent updates to keep user interest longitudinally.

Also, the radical differences in the feedback channels were surprising, but we already talked about that a bit earlier.

Ed: Your team collaborated with local officials, which is obviously valuable (and laudable), but it can potentially impose an extra burden on academics. For example, you mention that instead of employing novel feedback formats (e.g. video, audio, images, interactive maps), your team used only text. But do you think working with public officials benefitted the project as a whole, and how?

Simo: The extra burden is a necessity if one wants to really claim authentic success in civic engagement. In our opinion, it only happens between citizens and the city, not between citizens and researchers. We do not wish to build these deployments for the sake of an academic article or two: the display infrastructure is there for citizens and the city, and if we don’t educate the authorities on how to use it then nobody will. Advertisers would be glad to take over the entire real estate there, so in a way this project is just a part of the bigger picture. Which is making the display infrastructure “useful” instead of just a gimmick to kill time with (games) or for advertising.

And yes, the burden is real, but also because of this we could document what we have learned about dealing with authorities: how it is first easy to sell these prototypes to them, but sometimes hard to get commitment, etc. And it is not just this prototype—we’ve done a number of other civic engagement projects where we have noticed the same issues mentioned in the paper as well.

Ed: You also mention that as academics and policymakers you had different notions of success: for example in terms of levels of engagement and feedback of citizens. What should academics aspiring to have a concrete impact on society keep in mind when working with policymakers?

Simo: It takes a lot of time to assess impact. Policymakers will not be able to say after only a few weeks (which is the typical length of studies in our field) if the prototype has actual value to it, or if it’s just a “nice experiment”. So, deploy your strategy/tech/anything you’re doing, write about it, and let it sit. Move on with the life, and then revisit it after months to see if anything has come out of it! Patience is key here.

Ed: Did the citizen feedback result in any changes to the urban renovation project they were being consulted on?

Simo: Not the project directly: the project naturally was planned years ahead and the blueprints were final at that point. The most remarkable finding for us (and the authorities) was that after moderating the noise out from the feedback, the remaining insight was pretty much the only feedback that they ever directly got from citizens. Finns tend to be a bit on the shy side, so people won’t just pick up the phone and call the local engineering department and speak out. Not sure if anyone does, really? So they complain and chat on forums and coffee tables. So it would require active work for the authorities to find and reach out to these people.

With the display infrastructure, which was already there, we were able to gauge the public opinion that did not affect the construction directly, but indirectly affected how the department could manage their press releases, which things to stress in public communications, what parts of PR to handle differently in the next stage of the renovation project etc.

Ed: Are you planning any more experiments?

Simo: We are constantly running quite a few experiments. On the civic engagement side, for example, we are investigating how to gamify environmental awareness (recycling, waste management, keeping the environment clean) for children, as well as running longer longitudinal studies to assess the engagement of specify groups of people (e.g., children and the elderly).

Read the full article: Hosio, S., Goncalves, J., Kostakos, V. and Riekki, J. (2015) Crowdsourcing Public Opinion Using Urban Pervasive Technologies: Lessons From Real-Life Experiments in Oulu. Policy and Internet 7 (2) 203–222.

Simo Hosio is a research scientist (Dr. Tech.) at the University of Oulu, in Finland. Core topics of his research are smart city tech, crowdsourcing, wisdom of the crowd, civic engagement, and all types of “mobile stuff” in general.

Simo Hosio was talking to blog editor Pamina Smith.

Can text mining help handle the data deluge in public policy analysis?

Policy makers today must contend with two inescapable phenomena. On the one hand, there has been a major shift in the policies of governments concerning participatory governance—that is, engaged, collaborative, and community-focused public policy. At the same time, a significant proportion of government activities have now moved online, bringing about “a change to the whole information environment within which government operates” (Margetts 2009, 6).

Indeed, the Internet has become the main medium of interaction between government and citizens, and numerous websites offer opportunities for online democratic participation. The Hansard Society, for instance, regularly runs e-consultations on behalf of UK parliamentary select committees. For examples, e-consultations have been run on the Climate Change Bill (2007), the Human Tissue and Embryo Bill (2007), and on domestic violence and forced marriage (2008). Councils and boroughs also regularly invite citizens to take part in online consultations on issues affecting their area. The London Borough of Hammersmith and Fulham, for example, recently asked its residents for thier views on Sex Entertainment Venues and Sex Establishment Licensing policy.

However, citizen participation poses certain challenges for the design and analysis of public policy. In particular, governments and organisations must demonstrate that all opinions expressed through participatory exercises have been duly considered and carefully weighted before decisions are reached. One method for partly automating the interpretation of large quantities of online content typically produced by public consultations is text mining. Software products currently available range from those primarily used in qualitative research (integrating functions like tagging, indexing, and classification), to those integrating more quantitative and statistical tools, such as word frequency and cluster analysis (more information on text mining tools can be found at the National Centre for Text Mining).

While these methods have certainly attracted criticism and skepticism in terms of the interpretability of the output, they offer four important advantages for the analyst: namely categorisation, data reduction, visualisation, and speed.

1. Categorisation. When analysing the results of consultation exercises, analysts and policymakers must make sense of the high volume of disparate responses they receive; text mining supports the structuring of large amounts of this qualitative, discursive data into predefined or naturally occurring categories by storage and retrieval of sentence segments, indexing, and cross-referencing. Analysis of sentence segments from respondents with similar demographics (eg age) or opinions can itself be valuable, for example in the construction of descriptive typologies of respondents.

2. Data Reduction. Data reduction techniques include stemming (reduction of a word to its root form), combining of synonyms, and removal of non-informative “tool” or stop words. Hierarchical classifications, cluster analysis, and correspondence analysis methods allow the further reduction of texts to their structural components, highlighting the distinctive points of view associated with particular groups of respondents.

3. Visualisation. Important points and interrelationships are easy to miss when read by eye, and rapid generation of visual overviews of responses (eg dendrograms, 3D scatter plots, heat maps, etc.) make large and complex datasets easier to comprehend in terms of identifying the main points of view and dimensions of a public debate.

4. Speed. Speed depends on whether a special dictionary or vocabulary needs to be compiled for the analysis, and on the amount of coding required. Coding is usually relatively fast and straightforward, and the succinct overview of responses provided by these methods can reduce the time for consultation responses.

Despite the above advantages of automated approaches to consultation analysis, text mining methods present several limitations. Automatic classification of responses runs the risk of missing or miscategorising distinctive or marginal points of view if sentence segments are too short, or if they rely on a rare vocabulary. Stemming can also generate problems if important semantic variations are overlooked (eg lumping together ‘ill+ness’, ‘ill+defined’, and ‘ill+ustration’). Other issues applicable to public e-consultation analysis include the danger that analysts distance themselves from the data, especially when converting words to numbers. This is quite apart from the issues of inter-coder reliability and data preparation, missing data, and insensitivity to figurative language, meaning and context, which can also result in misclassification when not human-verified.

However, when responding to criticisms of specific tools, we need to remember that different text mining methods are complementary, not mutually exclusive. A single solution to the analysis of qualitative or quantitative data would be very unlikely; and at the very least, exploratory techniques provide a useful first step that could be followed by a theory-testing model, or by triangulation exercises to confirm results obtained by other methods.

Apart from these technical issues, policy makers and analysts employing text mining methods for e-consultation analysis must also consider certain ethical issues in addition to those of informed consent, privacy, and confidentiality. First (of relevance to academics), respondents may not expect to end up as research subjects. They may simply be expecting to participate in a general consultation exercise, interacting exclusively with public officials and not indirectly with an analyst post hoc; much less ending up as a specific, traceable data point.

This has been a particularly delicate issue for healthcare professionals. Sharf (1999, 247) describes various negative experiences of following up online postings: one woman, on being contacted by a researcher seeking consent to gain insights from breast cancer patients about their personal experiences, accused the researcher of behaving voyeuristically and “taking advantage of people in distress.” Statistical interpretation of responses also presents its own issues, particularly if analyses are to be returned or made accessible to respondents.

Respondents might also be confused about or disagree with text mining as a method applied to their answers; indeed, it could be perceived as dehumanising—reducing personal opinions and arguments to statistical data points. In a public consultation, respondents might feel somewhat betrayed that their views and opinions eventually result in just a dot on a correspondence analysis with no immediate, apparent meaning or import, at least in lay terms. Obviously the consultation organiser needs to outline clearly and precisely how qualitative responses can be collated into a quantifiable account of a sample population’s views.

This is an important point; in order to reduce both technical and ethical risks, researchers should ensure that their methodology combines both qualitative and quantitative analyses. While many text mining techniques provide useful statistical output, the UK Government’s prescribed Code of Practice on public consultation is quite explicit on the topic: “The focus should be on the evidence given by consultees to back up their arguments. Analysing consultation responses is primarily a qualitative rather than a quantitative exercise” (2008, 12). This suggests that the perennial debate between quantitative and qualitative methodologists needs to be updated and better resolved.


Margetts, H. 2009. “The Internet and Public Policy.” Policy & Internet 1 (1).

Sharf, B. 1999. “Beyond Netiquette: The Ethics of Doing Naturalistic Discourse Research on the Internet.” In Doing Internet Research, ed. S. Jones, London: Sage.

Read the full paper: Bicquelet, A., and Weale, A. (2011) Coping with the Cornucopia: Can Text Mining Help Handle the Data Deluge in Public Policy Analysis? Policy & Internet 3 (4).

Dr Aude Bicquelet is a Fellow in LSE’s Department of Methodology. Her main research interests include computer-assisted analysis, Text Mining methods, comparative politics and public policy. She has published a number of journal articles in these areas and is the author of a forthcoming book, “Textual Analysis” (Sage Benchmarks in Social Research Methods, in press).