Our knowledge of how automated agents interact is rather poor (and that could be a problem)

Recent years have seen a huge increase in the number of bots online — including search engine Web crawlers, online customer service chat bots, social media spambots, and content-editing bots in online collaborative communities like Wikipedia. (Bots are important contributors to Wikipedia, completing about 15% of all Wikipedia edits in 2014 overally, and more than 50% in certain language editions.)

While the online world has turned into an ecosystem of bots (by which we mean computer scripts that automatically handle repetitive and mundane tasks), our knowledge of how these automated agents interact with each other is rather poor. But being automata without capacity for emotions, meaning-making, creativity, or sociality, we might expect bot interactions to be relatively predictable and uneventful.

In their PLOS ONE article “Even good bots fight: The case of Wikipedia“, Milena Tsvetkova, Ruth García-Gavilanes, Luciano Floridi, and Taha Yasseri analyze the interactions between bots that edit articles on Wikipedia. They track the extent to which bots undid each other’s edits over the period 2001–2010, model how pairs of bots interact over time, and identify different types of interaction outcomes. Although Wikipedia bots are intended to support the encyclopaedia — identifying and undoing vandalism, enforcing bans, checking spelling, creating inter-language links, importing content automatically, mining data, identifying copyright violations, greeting newcomers, etc. — the authors find they often undid each other’s edits, with these sterile “fights” sometimes continuing for years.

They suggest that even relatively “dumb” bots may give rise to complex interactions, carrying important implications for Artificial Intelligence research. Understanding these bot-bot interactions will be crucial for managing social media, providing adequate cyber-security, and designing autonomous vehicles (that don’t crash..).

We caught up with Taha Yasseri and Luciano Floridi to discuss the implications of the findings:

Ed.: Is there any particular difference between the way individual bots interact (and maybe get bogged down in conflict), and lines of vast and complex code interacting badly, or having unforeseen results (e.g. flash-crashes in automated trading): i.e. is this just (another) example of us not always being able to anticipate how code interacts in the wild?

Taha: There are similarities and differences. The most notable difference is that here bots are not competing. They all work based on same rules and more importantly to achieve the same goal that is to increase the quality of the encyclopedia. Considering these features, the rather antagonistic interactions between the bots come as a surprise.

Ed.: Wikipedia have said that they know about it, and that it’s a minor problem: but I suppose Wikipedia presents a nice, open, benevolent system to make a start on examining and understanding bot interactions. What other bot-systems are you aware of, or that you could have looked at?

Taha: In terms of content generating bots, Twitter bots have turned out to be very important in terms of online propaganda. The crawlers bots that collect information from social media or the web (such as personal information or email addresses) are also being heavily deployed. In fact we have come up with a first typology of the Internet bots based on their type of action and their intentions (benevolent vs malevolent), that is presented in the article.

Ed.: You’ve also done work on human collaborations (e.g. in the citizen science projects of the Zooniverse) — is there any work comparing human collaborations with bot collaborations — or even examining human-bot collaborations and interactions?

Taha: In the present work we do compare bot-bot interactions with human-human interactions to observe similarities and differences. The most striking difference is in the dynamics of negative interactions. While human conflicts heat up very quickly and then disappear after a while, bots undoing each others’ contribution comes as a steady flow which might persist over years. In the HUMANE project, we discuss the co-existence of humans and machines in the digital world from a theoretical point of view and there we discuss such ecosystems in details.

Ed.: Humans obviously interact badly, fairly often (despite being a social species) .. why should we be particularly worried about how bots interact with each other, given humans seem to expect and cope with social inefficiency, annoyances, conflict and break-down? Isn’t this just more of the same?

Luciano: The fact that bots can be as bad as humans is far from reassuring. The fact that this happens even when they are programmed to collaborate is more disconcerting than what happens among humans when these compete, or fight each other. Here are very elementary mechanisms that through simple interactions generate messy and conflictual outcomes. One may hope this is not evidence of what may happen when more complex systems and interactions are in question. The lesson I learnt from all this is that without rules or some kind of normative framework that promote collaboration, not even good mechanisms ensure a good outcome.

Read the full article: Tsvetkova M, Garcia-Gavilanes R, Floridi, L, Yasseri T (2017) Even good bots fight: The case of Wikipedia. PLoS ONE 12(2): e0171774. doi:10.1371/journal.pone.0171774


Taha Yasseri and Luciano Floridi were talking to blog editor David Sutcliffe.

Can we predict electoral outcomes from Wikipedia traffic?

As digital technologies become increasingly integrated into the fabric of social life their ability to generate large amounts of information about the opinions and activities of the population increases. The opportunities in this area are enormous: predictions based on socially generated data are much cheaper than conventional opinion polling, offer the potential to avoid classic biases inherent in asking people to report their opinions and behaviour, and can deliver results much quicker and be updated more rapidly.

In their article published in EPJ Data Science, Taha Yasseri and Jonathan Bright develop a theoretically informed prediction of election results from socially generated data combined with an understanding of the social processes through which the data are generated. They can thereby explore the predictive power of socially generated data while enhancing theory about the relationship between socially generated data and real world outcomes. Their particular focus is on the readership statistics of politically relevant Wikipedia articles (such as those of individual political parties) in the time period just before an election.

By applying these methods to a variety of different European countries in the context of the 2009 and 2014 European Parliament elections they firstly show that the relative change in number of page views to the general Wikipedia page on the election can offer a reasonable estimate of the relative change in election turnout at the country level. This supports the idea that increases in online information seeking at election time are driven by voters who are considering voting.

Second, they show that a theoretically informed model based on previous national results, Wikipedia page views, news media mentions, and basic information about the political party in question can offer a good prediction of the overall vote share of the party in question. Third, they present a model for predicting change in vote share (i.e., voters swinging towards and away from a party), showing that Wikipedia page-view data provide an important increase in predictive power in this context.

This relationship is exaggerated in the case of newer parties — consistent with the idea that voters don’t seek information uniformly about all parties at election time. Rather, they behave like ‘cognitive misers’, being more likely to seek information on new political parties with which they do not have previous experience and being more likely to seek information only when they are actually changing the way they vote.

In contrast, there was no evidence of a ‘media effect’: there was little correlation between news media mentions and overall Wikipedia traffic patterns. Indeed, the news media and Wikipedia appeared to be biased towards different things: with the news favouring incumbent parties, and Wikipedia favouring new ones.

Read the full article: Yasseri, T. and Bright, J. (2016) Wikipedia traffic data and electoral prediction: towards theoretically informed models. EPJ Data Science. 5 (1).

We caught up with the authors to explore the implications of the work.

Ed: Wikipedia represents a vast amount of not just content, but also user behaviour data. How did you access the page view stats — but also: is anyone building dynamic visualisations of Wikipedia data in real time?

Taha and Jonathan: Wikipedia makes its page view data available for free (in the same way as it makes all of its information available!). You can find the data here, along with some visualisations

Ed: Why did you use Wikipedia data to examine election prediction rather than (the I suppose the more fashionable) Twitter? How do they compare as data sources?

Taha and Jonathan: One of the big problems with using Twitter to predict things like elections is that contributing on social media is a very public thing and people are quite conscious of this. For example, some parties are seen as unfashionable so people might not make their voting choice explicit. Hence overall social media might seem to be saying one thing whereas actually people are thinking another.

By contrast, looking for information online on a website like Wikipedia is an essentially private activity so there aren’t these social biases. In other words, on Wikipedia we can directly have access to transactional data on what people do, rather than what they say or prefer to say.

Ed: How did these results and findings compare with the social media analysis done as part of our UK General Election 2015 Election Night Data Hack? (long title..)

Taha and Jonathan: The GE2015 data hack looked at individual politicians. We found that having a Wikipedia page is becoming increasingly important — over 40% of Labour and Conservative Party candidates had an individual Wikipedia page. We also found that this was highly correlated with Twitter presence — being more active on one network also made you more likely to be active on the other one. And we found some initial evidence that social media reaction was correlated with votes, though there is a lot more work to do here!

Ed: Can you see digital social data analysis replacing (or maybe just complementing) opinion polling in any meaningful way? And what problems would need to be addressed before that happened: e.g. around representative sampling, data cleaning, and weeding out bots?

Taha and Jonathan: Most political pundits are starting to look at a range of indicators of popularity — for example, not just voting intention, but also ratings of leadership competence, economic performance, etc. We can see good potential for social data to become part of this range of popularity indicator. However we don’t think it will replace polling just yet; the use of social media is limited to certain demographics. Also, the data collected from social media are often very shallow, not allowing for validation. In the case of Wikipedia, for example, we only know how many times each page is viewed, but we don’t know by how many people and from where.

Ed: You do a lot of research with Wikipedia data — has that made you reflect on your own use of Wikipedia?

Taha and Jonathan: It’s interesting to think about this activity of getting direct information about politicians — it’s essentially a new activity, something you couldn’t do in the pre-digital age. I know that I personally [Jonathan] use it to find out things about politicians and political parties — it would be interesting to know more about why other people are using it as well. This could have a lot of impacts. One thing Wikipedia has is a really long memory, in a way that other means of getting information on politicians (such as newspapers) perhaps don’t. We could start to see this type of thing becoming more important in electoral politics.

[Taha] .. since my research has been mostly focused on Wikipedia edit wars between human and bot editors, I have naturally become more cautious about the information I find on Wikipedia. When it comes to sensitive topics, sach as politics, Wikipedia is a good point to start, but not a great point to end the search!


Taha Yasseri and Jonathan Bright were talking to blog editor David Sutcliffe.

Brexit, voting, and political turbulence

Cross-posted from the Princeton University Press blog. The authors of Political Turbulence discuss how the explosive rise, non-normal distribution and lack of organization that characterizes contemporary politics as a chaotic system, can explain why many political mobilizations of our times seem to come from nowhere.


On 23rd June 2016, a majority of the British public voted in a referendum on whether to leave the European Union. The Leave or so-called #Brexit option was victorious, with a margin of 52% to 48% across the country, although Scotland, Northern Ireland, London and some towns voted to remain. The result was a shock to both leave and remain supporters alike. US readers might note that when the polls closed, the odds on futures markets of Brexit (15%) were longer than those of Trump being elected President.

Political scientists are reeling with the sheer volume of politics that has been packed into the month after the result. From the Prime Minister’s morning-after resignation on 24th June the country was mired in political chaos, with almost every political institution challenged and under question in the aftermath of the vote, including both Conservative and Labour parties and the existence of the United Kingdom itself, given Scotland’s resistance to leaving the EU. The eventual formation of a government under a new prime minister, Teresa May, has brought some stability. But she was not elected and her government has a tiny majority of only 12 Members of Parliament. A cartoon by Matt in the Telegraph on July 2nd (which would work for almost any day) showed two students, one of them saying ‘I’m studying politics. The course covers the period from 8am on Thursday to lunchtime on Friday.’

All these events – the campaigns to remain or leave, the post-referendum turmoil, resignations, sackings and appointments – were played out on social media; the speed of change and the unpredictability of events being far too great for conventional media to keep pace. So our book, Political Turbulence: How Social Media Shape Collective Action, can provide a way to think about the past weeks. The book focuses on how social media allow new, ‘tiny acts’ of political participation (liking, tweeting, viewing, following, signing petitions and so on), which turn social movement theory around. Rather than identifying with issues, forming collective identity and then acting to support the interests of that identity – or voting for a political party that supports it – in a social media world, people act first, and think about it, or identify with others later – if at all.

These tiny acts of participation can scale up to large-scale mobilizations, such as demonstrations, protests or petitions for policy change. These mobilizations normally fail – 99.9% of petitions to the UK or US governments fail to get the 100,000 signatures required for a parliamentary debate (UK) or an official response (US). The very few that succeed usually do so very quickly on a massive scale, but without the normal organizational or institutional trappings of a social or political movement, such as leaders or political parties. When Brazilian President Dilma Rousseff asked to speak to the leaders of the mass demonstrations against the government in 2014 organised entirely on social media with an explicit rejection of party politics, she was told ‘there are no leaders’.

This explosive rise, non-normal distribution and lack of organization that characterizes contemporary politics as a chaotic system, can explain why many political mobilizations of our times seem to come from nowhere. In the US and the UK it can help to understand the shock waves of support that brought Bernie Sanders, Donald Trump, Jeremy Corbyn (elected leader of the Labour party in 2015) and Brexit itself, all of which have challenged so strongly traditional political institutions. In both countries, the two largest political parties are creaking to breaking point in their efforts to accommodate these phenomena.

The unpredicted support for Brexit by over half of voters in the UK referendum illustrates these characteristics of the movements we model in the book, with the resistance to traditional forms of organization. Voters were courted by political institutions from all sides – the government, all the political parties apart from UKIP, the Bank of England, international organizations, foreign governments, the US President himself and the ‘Remain’ or StrongerIn campaign convened by Conservative, Labour and the smaller parties. Virtually every authoritative source of information supported Remain. Yet people were resistant to aligning themselves with any of them. Experts, facts, leaders of any kind were all rejected by the rising swell of support for the Leave side. Famously, Michael Gove, one of the key leave campaigners said ‘we have had enough of experts’. According to YouGov polls, over 2/3 of Conservative voters in 2015 voted to Leave in 2016, as did over one third of Labour and Liberal Democrat voters.

Instead, people turned to a few key claims promulgated by the two Leave campaigns Vote Leave(with key Conservative Brexiteers such as Boris Johnson, Michael Gove and Liam Fox) and Leave.EU, dominated by UKIP and its leader Nigel Farage, bankrolled by the aptly named billionaire Arron Banks. This side dominated social media in driving home their simple (if largely untrue) claims and anti-establishment, anti-elitist message (although all were part of the upper echelons of both establishment and elite). Key memes included the claim (painted on the side of a bus) that the UK gave £350m a week to the EU which could instead be spent on the NHS; the likelihood that Turkey would soon join the EU; and an image showing floods of migrants entering the UK via Europe. Banks brought in staff from his own insurance companies and political campaign firms (such as Goddard Gunster) and Leave.EU created a massive database of leave supporters to employ targeted advertising on social media.

While Remain represented the status-quo and a known entity, Leave was flexible to sell itself as anything to anyone. Leave campaigners would often criticize the Government but then not offer specific policy alternatives stating, ‘we are a campaign not a government.’ This ability for people to coalesce around a movement for a variety of different (and sometimes conflicting) reasons is a hallmark of the social-media based campaigns that characterize Political Turbulence. Some voters and campaigners argued that voting Leave would allow the UK to be more global and accept more immigrants from non-EU countries. In contrast, racism and anti-immigration sentiment were key reasons for other voters. Desire for sovereignty and independence, responses to austerity and economic inequality and hostility to the elites in London and the South East have all figured in the torrent of post-Brexit analysis. These alternative faces of Leave were exploited to gain votes for ‘change,’ but the exact change sought by any two voters could be very different.

The movement‘s organization illustrates what we have observed in recent political turbulence – as in Brazil, Hong Kong and Egypt; a complete rejection of mainstream political parties and institutions and an absence of leaders in any conventional sense. There is little evidence that the leading lights of the Leave campaigns were seen as prospective leaders. There was no outcry from the Leave side when they seemed to melt away after the vote, no mourning over Michael Gove’s complete fall from grace when the government was formed – nor even joy at Boris Johnson’s appointment as Foreign Secretary. Rather, the Leave campaigns acted like advertising campaigns, driving their points home to all corners of the online and offline worlds but without a clear public face. After the result, it transpired that there was no plan, no policy proposals, no exit strategy proposed by either campaign. The Vote Leave campaign was seemingly paralyzed by shock after the vote (they tried to delete their whole site, now reluctantly and partially restored with the lie on the side of the bus toned down to £50 million), pickled forever after 23rd June. Meanwhile, Teresa May, a reluctant Remain supporter and an absent figure during the referendum itself, emerged as the only viable leader after the event, in the same way as (in a very different context) the Muslim Brotherhood, as the only viable organization, were able to assume power after the first Egyptian revolution.

In contrast, the Leave.Eu website remains highly active, possibly poised for the rebirth of UKIP as a radical populist far-right party on the European model, as Arron Banks has proposed. UKIP was formed around this single policy – of leaving the EU – and will struggle to find policy purpose, post-Brexit. A new party, with Banks’ huge resources and a massive database of Leave supporters and their social media affiliations, possibly disenchanted by the slow progress of Brexit, disaffected by the traditional parties – might be a political winner on the new landscape.

The act of voting in the referendum will define people’s political identity for the foreseeable future, shaping the way they vote in any forthcoming election. The entire political system is being redrawn around this single issue, and whichever organizational grouping can ride the wave will win. The one thing we can predict for our political future is that it will be unpredictable.

 

Brexit, voting, and political turbulence

Cross-posted from the Princeton University Press blog. The authors of Political Turbulence discuss how the explosive rise, non-normal distribution and lack of organization that characterizes contemporary politics as a chaotic system, can explain why many political mobilizations of our times seem to come from nowhere.


On 23rd June 2016, a majority of the British public voted in a referendum on whether to leave the European Union. The Leave or so-called #Brexit option was victorious, with a margin of 52% to 48% across the country, although Scotland, Northern Ireland, London and some towns voted to remain. The result was a shock to both leave and remain supporters alike. US readers might note that when the polls closed, the odds on futures markets of Brexit (15%) were longer than those of Trump being elected President.

Political scientists are reeling with the sheer volume of politics that has been packed into the month after the result. From the Prime Minister’s morning-after resignation on 24th June the country was mired in political chaos, with almost every political institution challenged and under question in the aftermath of the vote, including both Conservative and Labour parties and the existence of the United Kingdom itself, given Scotland’s resistance to leaving the EU. The eventual formation of a government under a new prime minister, Teresa May, has brought some stability. But she was not elected and her government has a tiny majority of only 12 Members of Parliament. A cartoon by Matt in the Telegraph on July 2nd (which would work for almost any day) showed two students, one of them saying ‘I’m studying politics. The course covers the period from 8am on Thursday to lunchtime on Friday.’

All these events – the campaigns to remain or leave, the post-referendum turmoil, resignations, sackings and appointments – were played out on social media; the speed of change and the unpredictability of events being far too great for conventional media to keep pace. So our book, Political Turbulence: How Social Media Shape Collective Action, can provide a way to think about the past weeks. The book focuses on how social media allow new, ‘tiny acts’ of political participation (liking, tweeting, viewing, following, signing petitions and so on), which turn social movement theory around. Rather than identifying with issues, forming collective identity and then acting to support the interests of that identity – or voting for a political party that supports it – in a social media world, people act first, and think about it, or identify with others later – if at all.

These tiny acts of participation can scale up to large-scale mobilizations, such as demonstrations, protests or petitions for policy change. These mobilizations normally fail – 99.9% of petitions to the UK or US governments fail to get the 100,000 signatures required for a parliamentary debate (UK) or an official response (US). The very few that succeed usually do so very quickly on a massive scale, but without the normal organizational or institutional trappings of a social or political movement, such as leaders or political parties. When Brazilian President Dilma Rousseff asked to speak to the leaders of the mass demonstrations against the government in 2014 organised entirely on social media with an explicit rejection of party politics, she was told ‘there are no leaders’.

This explosive rise, non-normal distribution and lack of organization that characterizes contemporary politics as a chaotic system, can explain why many political mobilizations of our times seem to come from nowhere. In the US and the UK it can help to understand the shock waves of support that brought Bernie Sanders, Donald Trump, Jeremy Corbyn (elected leader of the Labour party in 2015) and Brexit itself, all of which have challenged so strongly traditional political institutions. In both countries, the two largest political parties are creaking to breaking point in their efforts to accommodate these phenomena.

The unpredicted support for Brexit by over half of voters in the UK referendum illustrates these characteristics of the movements we model in the book, with the resistance to traditional forms of organization. Voters were courted by political institutions from all sides – the government, all the political parties apart from UKIP, the Bank of England, international organizations, foreign governments, the US President himself and the ‘Remain’ or StrongerIn campaign convened by Conservative, Labour and the smaller parties. Virtually every authoritative source of information supported Remain. Yet people were resistant to aligning themselves with any of them. Experts, facts, leaders of any kind were all rejected by the rising swell of support for the Leave side. Famously, Michael Gove, one of the key leave campaigners said ‘we have had enough of experts’. According to YouGov polls, over 2/3 of Conservative voters in 2015 voted to Leave in 2016, as did over one third of Labour and Liberal Democrat voters.

Instead, people turned to a few key claims promulgated by the two Leave campaigns Vote Leave(with key Conservative Brexiteers such as Boris Johnson, Michael Gove and Liam Fox) and Leave.EU, dominated by UKIP and its leader Nigel Farage, bankrolled by the aptly named billionaire Arron Banks. This side dominated social media in driving home their simple (if largely untrue) claims and anti-establishment, anti-elitist message (although all were part of the upper echelons of both establishment and elite). Key memes included the claim (painted on the side of a bus) that the UK gave £350m a week to the EU which could instead be spent on the NHS; the likelihood that Turkey would soon join the EU; and an image showing floods of migrants entering the UK via Europe. Banks brought in staff from his own insurance companies and political campaign firms (such as Goddard Gunster) and Leave.EU created a massive database of leave supporters to employ targeted advertising on social media.

While Remain represented the status-quo and a known entity, Leave was flexible to sell itself as anything to anyone. Leave campaigners would often criticize the Government but then not offer specific policy alternatives stating, ‘we are a campaign not a government.’ This ability for people to coalesce around a movement for a variety of different (and sometimes conflicting) reasons is a hallmark of the social-media based campaigns that characterize Political Turbulence. Some voters and campaigners argued that voting Leave would allow the UK to be more global and accept more immigrants from non-EU countries. In contrast, racism and anti-immigration sentiment were key reasons for other voters. Desire for sovereignty and independence, responses to austerity and economic inequality and hostility to the elites in London and the South East have all figured in the torrent of post-Brexit analysis. These alternative faces of Leave were exploited to gain votes for ‘change,’ but the exact change sought by any two voters could be very different.

The movement‘s organization illustrates what we have observed in recent political turbulence – as in Brazil, Hong Kong and Egypt; a complete rejection of mainstream political parties and institutions and an absence of leaders in any conventional sense. There is little evidence that the leading lights of the Leave campaigns were seen as prospective leaders. There was no outcry from the Leave side when they seemed to melt away after the vote, no mourning over Michael Gove’s complete fall from grace when the government was formed – nor even joy at Boris Johnson’s appointment as Foreign Secretary. Rather, the Leave campaigns acted like advertising campaigns, driving their points home to all corners of the online and offline worlds but without a clear public face. After the result, it transpired that there was no plan, no policy proposals, no exit strategy proposed by either campaign. The Vote Leave campaign was seemingly paralyzed by shock after the vote (they tried to delete their whole site, now reluctantly and partially restored with the lie on the side of the bus toned down to £50 million), pickled forever after 23rd June. Meanwhile, Teresa May, a reluctant Remain supporter and an absent figure during the referendum itself, emerged as the only viable leader after the event, in the same way as (in a very different context) the Muslim Brotherhood, as the only viable organization, were able to assume power after the first Egyptian revolution.

In contrast, the Leave.Eu website remains highly active, possibly poised for the rebirth of UKIP as a radical populist far-right party on the European model, as Arron Banks has proposed. UKIP was formed around this single policy – of leaving the EU – and will struggle to find policy purpose, post-Brexit. A new party, with Banks’ huge resources and a massive database of Leave supporters and their social media affiliations, possibly disenchanted by the slow progress of Brexit, disaffected by the traditional parties – might be a political winner on the new landscape.

The act of voting in the referendum will define people’s political identity for the foreseeable future, shaping the way they vote in any forthcoming election. The entire political system is being redrawn around this single issue, and whichever organizational grouping can ride the wave will win. The one thing we can predict for our political future is that it will be unpredictable.

 

Sexism Typology: Literature Review

The Everyday Sexism Project catalogues instances of sexism experienced by women on a day to day basis. We will be using computational techniques to extract the most commonly occurring sexism-related topics.

As Laura Bates, founder of the Everyday Sexism project, has recently highlighted, “it seems to be increasingly difficult to talk about sexism, equality, and women’s rights” (Everyday Sexism Project, 2015). With many theorists suggesting that we have entered a so-called “post-feminist” era in which gender equality has been achieved (cf. McRobbie, 2008; Modleski, 1991), to complain about sexism not only risks being labelled as “uptight”, “prudish”, or a “militant feminist”, but also exposes those who speak out to sustained, and at times vicious, personal attacks (Everyday Sexism Project, 2015). Despite this, thousands of women are speaking out, through Bates’ project, about their experiences of everyday sexism. Our research seeks to draw on the rich history of gender studies in the social sciences, coupling it with emerging computational methods for topic modelling, to better understand the content of reports to the Everyday Sexism Project and the lived experiences of those who post them. Here, we outline the literature which contextualizes our study.

Studies on sexism are far from new. Indeed, particularly amongst feminist theorists and sociologists, the analysis (and deconstruction) of “inequality based on sex or gender categorization” (Harper, 2008) has formed a central tenet of both academic inquiry and a radical politics of female emancipation for several decades (De Beauvoir, 1949; Friedan, 1963; Rubin, 1975; Millett, 1971). Reflecting its feminist origins, historical research on sexism has broadly focused on defining sexist interactions (cf. Glick and Fiske, 1997) and on highlighting the problematic, biologically rooted ‘gender roles’ that form the foundation of inequality between men and women (Millett, 1971; Renzetti and Curran, 1992; Chodorow, 1995).

More recent studies, particularly in the field of psychology, have shifted the focus away from whether and how sexism exists, towards an examination of the psychological, personal, and social implications that sexist incidents have for the women who experience them. As such, theorists such as Matteson and Moradi (2005), Swim et al (2001) and Jost and Kay (2005) have highlighted the damaging intellectual and mental health outcomes for women who are subject to continual experiences of sexism. Atwood, for example, argues in her study of gender bias in families, that sexism combines with other life stressors to create significant psychological distress in women, resulting in them needing to “seek therapy, most commonly for depression and anxiety” (2001, 169).

Given its increasing ubiquity in every day life, it is hardly surprising that the relationship between technology and sexism has also sparked interest from contemporary researchers in the field. Indeed, several studies have explored the intersection between gender and power online, with Susan Herring’s work on gender differences in computer-mediated communication being of particular note (cf. Herring, 2008). Theorists such as Mindi D. Foster have focused on the impact that using digital technology, and particularly Web 2.0 technologies, to talk about sexism can have on women’s well being. Foster’s study found that when women tweeted about sexism, and in particular when they used tweets to a) name the problem, b) criticise it, or c) to suggest change, they viewed their actions as effective and had enhanced life satisfaction, and therefore felt empowered (Foster, 2015: 21).

Despite this diversity of research on sexism, however, there remain some notable gaps in understanding. In particular, as this study hopes to highlight, little previous research on sexism has considered the different ‘types’ of sexism experienced by women (beyond an identification of the workplace and the education system as contexts in which sexism often manifests as per Barnett, 2005; Watkins et al., 2006; Klein, 1992). Furthermore, research focusing on sexism has thus far been largely qualitative in nature. Although a small number of studies have employed quantitative methods (cf. Brandt 2011; Becker and Wright, 2011), none have used computational approaches to analyse the wealth of available online data on sexism.

This project, which will apply a natural language processing approach to analyse data collected from the Everyday Sexism Project website, seeks to fill such a gap. By providing much needed analysis of a large-scale crowd sourced data set on sexism, it is our hope that knowledge gained from this study will advance both the sociological understanding of women’s lived experiences of sexism, and methodological understandings of the suitability of computational topic modelling for conducting this kind of research.

Find out more about the OII’s research on the Everyday Sexism project by visiting the webpage or by looking at the other Policy & Internet blog posts on the project – post 1 and post 2.


Taha Yasseri is a Research Fellow at the OII who has interests in analysis of Big Data to understand human dynamics, government-society interactions, mass collaboration, and opinion dynamics.

Kathryn Eccles is a Research Fellow at the OII who has research interests in the impact of new technologies on scholarly behaviour and research, particularly in the Humanities.

Sophie Melville is a Research Assistant working at the OII. She previously completed the MSc in the Social Science of the Internet at the OII.

References

Atwood, N. C. (2001). Gender bias in families and its clinical implications for women. Social Work, 46 pp. 23–36.

Barnett, R. C. (2005). Ageism and Sexism in the workplace. Generations. 29(3) pp. 25 30.

Bates, Laura. (2015). Everyday Sexism [online] Available at: http://everydaysexism.com [Accessed 1 May 2016].

Becker, Julia C. & Wright, Stephen C. (2011). Yet another dark side of chivalry: Benevolent sexism undermines and hostile sexism motivates collective action for social change. Journal of Personality and Social Psychology, Vol 101(1), Jul 2011, 62-77

Brandt, Mark. (2011). Sexism and Gender Inequality across 57 societies. Psychiological Science. 22(11).

Chodorow, Nancy (1995). “Becoming a feminist foremother”. In Phyllis Chesler, Esther D. Rothblum, Ellen Cole,. Feminist foremothers in women’s studies, psychology, and mental health. New York: Haworth Press. pp. 141–154.

De Beauvoir, Simone. (1949). The second sex, woman as other. London: Vintage.

Foster, M. D. (2015). Tweeting about sexism: The well-being benefits of a social media collective action. British Journal of Social Psychology.

Friedan, Betty. (1963). The Feminine Mystique. W. W. Norton and Co.

Glick, Peter. & Fiske, Susan T. (1997). Hostile and Benevolent Sexism. Psychology of Women Quarterly, 21(1) pp. 119 – 135.

Harper, Amney J. (2008). The relationship between sexism, ambivalent sexism, and relationship quality in heterosexual women. PhD Auburn University.

Herring, Susan C. (2008). Gender and Power in Online Communication. In Janet Holmes and Miriam Meyerhoff (eds) The Handbook of Language and Gender. Oxford: Blackwell.

Jost, J. T., & Kay, A. C. (2005). Exposure to benevolent sexism and complementary gender stereotypes: Consequences for specific and diffuse forms of system justification. Journal of Personality and Social Psychology, 88 pp. 498–509.

Klein, Susan Shurberg. (1992). Sex equity and sexuality in education: breaking the barriers. State University of New York Press.

Matteson, A. V., & Moradi, B. (2005). Examining the structure of the schedule of sexist events: Replication and extension. Psychology of Women Quarterly, 29 pp. 47–57.

McRobbie, Angela (2004). Post-feminism and popular culture. Feminist Media Studies, 4(3) pp. 255 – 264.

Millett, Kate. (1971). Sexual politics. UK: Virago.

Modleski, Tania. (1991). Feminism without women: culture and criticism in a “postfeminist” age. New York: Routledge.

Renzetti, C. and D. Curran, 1992, “Sex-Role Socialization”, in Feminist Philosophies, J. Kourany, J. Sterba, and R. Tong (eds.), New Jersey: Prentice Hall.

Rubin, Gayle. (1975). The traffic in women: notes on the “political economy” of sex. In Rayna R. Reiter (ed.), Toward and anthropology of women. Monthly Review Press.

Swim, J. K., Hyers, L. L., Cohen, L. L., & Ferguson, M. J. (2001). Everyday sexism: Evidence for its incidence, nature, and psychological impact from three daily diary studies. Journal of Social Issues, 57 pp. 31–53.

Watkins et al. (2006). Does it pay to be sexist? The relationship between modern sexism and career outcomes. Journal of Vocational Behaviour. 69(3) pp. 524 – 537.

Topic modelling content from the “Everyday Sexism” project: what’s it all about?

We recently announced the start of an exciting new research project that will involve the use of topic modelling in understanding the patterns in submitted stories to the Everyday Sexism website. Here, we briefly explain our text analysis approach, “topic modelling”.

At its very core, topic modelling is a technique that seeks to automatically discover the topics contained within a group of documents. ‘Documents’ in this context could refer to text items as lengthy as individual books, or as short as sentences within a paragraph. Let’s take the idea of sentences-as-documents as an example:

  • Document 1: I like to eat kippers for breakfast.
  • Document 2: I love all animals, but kittens are the cutest.
  • Document 3: My kitten eats kippers too.

Assuming that each sentence contains a mixture of different topics (and that a ‘topic’ can be understood as a collection of words (of any part of speech) that have different probabilities of appearance in passages discussing the topic), how does the topic modelling algorithm ‘discover’ the topics within these sentences?

The algorithm is initiated by setting the number of topics that it needs to extract. Of course, it is hard to guess this number without having an insight on the topics, but one can think of this as a resolution tuning parameter. The smaller the number of topics is set, the more general the bag of words in each topic would be, and the looser the connections between them.

The algorithm loops through all of the words in each document, assigning every word to one of our topics in a temporary and semi-random manner. This initial assignment is arbitrary and it is easy to show that different initializations lead to the same results in long run. Once each word has been assigned a temporary topic, the algorithm then re-iterates through each word in each document to update the topic assignment using two criteria: 1) How prevalent is the word in question across topics? And 2) How prevalent are the topics in the document?

To quantify these two, the algorithm calculates the likelihood of the words appearing in each document assuming the assignment of words to topics and topics to documents. 

Of course words can appear in different topics and more than one topic can appear in a document. But the iterative algorithm seeks to maximize the self-consistency of the assignment by maximizing the likelihood of the observed word-document statistics. 

We can illustrate this process and its outcome by going back to our example. A topic modelling approach might use the process above to discover the following topics across our documents:

  • Document 1: I like to eat kippers for breakfast[100% Topic A]
  • Document 2: I love all animals, but kittens are the cutest. [100% Topic B]
  • Document 3: My kitten eats kippers too. [67% Topic A, 33% Topic B]

Topic modelling defines each topic as a so-called ‘bag of words’, but it is the researcher’s responsibility to decide upon an appropriate label for each topic based on their understanding of language and context. Going back to our example, the algorithm might classify the underlined words under Topic A, which we could then label as ‘food’ based on our understanding of what the words mean. Similarly the italicised words might be classified under a separate topic, Topic B, which we could label ‘animals’. In this simple example the word “eat” has appeared in a sentence dominated by Topic A, but also in a sentence with some association to Topic B. Therefore it can also be seen as a connector of the two topics. Of course animals eat too and they like food!

We are going to use a similar approach to first extract the main topics reflected on the reports to the Everyday Sexism Project website and extract the relation between the sexism-related topics and concepts based on the overlap between the bags of words of each topic. Finally we can also look into the co-appearance of topics in the same document.  This way we try to draw a linguistic picture of the more than 100,000 submitted reports.

As ever, be sure to check back for further updates on our progress!

Creating a semantic map of sexism worldwide: topic modelling of content from the “Everyday Sexism” project

The Everyday Sexism Project catalogues instances of sexism experienced by women on a day to day basis. We will be using computational techniques to extract the most commonly occurring sexism-related topics.

When barrister Charlotte Proudman recently spoke out regarding a sexist comment that she had received on the professional networking website LinkedIn, hundreds of women praised her actions in highlighting the issue of workplace sexism – and many of them began to tell similar stories of their own. It soon became apparent that Proudman was not alone in experiencing this kind of sexism, a fact further corroborated by Laura Bates of the Everyday Sexism Project, who asserted that workplace harassment is “the most reported kind of incident” on the project’s UK website.

Proudman’s experience and Bates’ comments on the number of submissions to her site concerning harassment at work provokes a conversation about the nature of sexism, not only in the UK but also at a global level. We know that since its launch in 2012, the Everyday Sexism Project has received over 100,000 submissions in more than 13 different languages, concerning a variety of topics. But what are these topics? As Bates has stated, in the UK, workplace sexism is the most commonly discussed subject on the website – but is this also the case for the Everyday Sexism sites in France, Japan, or Brazil? What are the most common types of sexism globally, and (how) do they relate to each other? Do experiences of sexism change from one country to another?

The multi-lingual reports submitted to the Everyday Sexism project are undoubtedly a gold mine of crowdsourced information with great potential for answering important questions about instances of sexism worldwide, as well as drawing an overall picture of how sexism is experienced in different societies. So far much of the research relating to the Everyday Sexism project has focused on qualitative content analysis, and has been limited to the submissions written in English. Along with Principal Investigators Taha Yasseri and Kathryn Eccles, I will be acting as Research Assistant on a new project funded by the John Fell Oxford University Press Research Fund, that hopes to expand the methods used to investigate Everyday Sexism submission data, by undertaking a large-scale computational study that will enrich existing qualitative work in this area.

Entitled “Semantic Mapping of Sexism: Topic Modelling of Everyday Sexism Content”, our project will take a Natural Language Processing approach, analysing the content of Everyday Sexism reports in different languages, and using topic-modelling techniques to extract the most commonly occurring sexism-related topics and concepts from the submissions. We will map the semantic relations between those topics within and across different languages, comparing and contrasting the ways in which sexism is experienced in everyday life in different cultures and geographies. Ultimately, we hope to create the first data-driven map of sexism on a global scale, forming a solid framework for future studies in growing fields such as online inequality, cyber bullying, and social well being.

We’re very excited about the project and will be charting our progress via the Policy and Internet Blog, so make sure to check back for further updates!

Creating a semantic map of sexism worldwide: topic modelling of content from the “Everyday Sexism” project

The Everyday Sexism Project catalogues instances of sexism experienced by women on a day to day basis. We will be using computational techniques to extract the most commonly occurring sexism-related topics.

When barrister Charlotte Proudman recently spoke out regarding a sexist comment that she had received on the professional networking website LinkedIn, hundreds of women praised her actions in highlighting the issue of workplace sexism – and many of them began to tell similar stories of their own. It soon became apparent that Proudman was not alone in experiencing this kind of sexism, a fact further corroborated by Laura Bates of the Everyday Sexism Project, who asserted that workplace harassment is “the most reported kind of incident” on the project’s UK website.

Proudman’s experience and Bates’ comments on the number of submissions to her site concerning harassment at work provokes a conversation about the nature of sexism, not only in the UK but also at a global level. We know that since its launch in 2012, the Everyday Sexism Project has received over 100,000 submissions in more than 13 different languages, concerning a variety of topics. But what are these topics? As Bates has stated, in the UK, workplace sexism is the most commonly discussed subject on the website – but is this also the case for the Everyday Sexism sites in France, Japan, or Brazil? What are the most common types of sexism globally, and (how) do they relate to each other? Do experiences of sexism change from one country to another?

The multi-lingual reports submitted to the Everyday Sexism project are undoubtedly a gold mine of crowdsourced information with great potential for answering important questions about instances of sexism worldwide, as well as drawing an overall picture of how sexism is experienced in different societies. So far much of the research relating to the Everyday Sexism project has focused on qualitative content analysis, and has been limited to the submissions written in English. Along with Principal Investigators Taha Yasseri and Kathryn Eccles, I will be acting as Research Assistant on a new project funded by the John Fell Oxford University Press Research Fund, that hopes to expand the methods used to investigate Everyday Sexism submission data, by undertaking a large-scale computational study that will enrich existing qualitative work in this area.

Entitled “Semantic Mapping of Sexism: Topic Modelling of Everyday Sexism Content”, our project will take a Natural Language Processing approach, analysing the content of Everyday Sexism reports in different languages, and using topic-modelling techniques to extract the most commonly occurring sexism-related topics and concepts from the submissions. We will map the semantic relations between those topics within and across different languages, comparing and contrasting the ways in which sexism is experienced in everyday life in different cultures and geographies. Ultimately, we hope to create the first data-driven map of sexism on a global scale, forming a solid framework for future studies in growing fields such as online inequality, cyber bullying, and social well being.

We’re very excited about the project and will be charting our progress via the Policy and Internet Blog, so make sure to check back for further updates!