social data science

Data show that the relative change in page views to the general Wikipedia page on the election can offer an estimate of the relative change in election turnout.

2016 presidential candidate Donald Trump in a residential backyard near Jordan Creek Parkway and Cody Drive in West Des Moines, Iowa, with lights and security cameras. Image by Tony Webster (Flickr).

As digital technologies become increasingly integrated into the fabric of social life their ability to generate large amounts of information about the opinions and activities of the population increases. The opportunities in this area are enormous: predictions based on socially generated data are much cheaper than conventional opinion polling, offer the potential to avoid classic biases inherent in asking people to report their opinions and behaviour, and can deliver results much quicker and be updated more rapidly. In their article published in EPJ Data Science, Taha Yasseri and Jonathan Bright develop a theoretically informed prediction of election results from socially generated data combined with an understanding of the social processes through which the data are generated. They can thereby explore the predictive power of socially generated data while enhancing theory about the relationship between socially generated data and real world outcomes. Their particular focus is on the readership statistics of politically relevant Wikipedia articles (such as those of individual political parties) in the time period just before an election. By applying these methods to a variety of different European countries in the context of the 2009 and 2014 European Parliament elections they firstly show that the relative change in number of page views to the general Wikipedia page on the election can offer a reasonable estimate of the relative change in election turnout at the country level. This supports the idea that increases in online information seeking at election time are driven by voters who are considering voting. Second, they show that a theoretically informed model based on previous national results, Wikipedia page views, news media mentions, and basic information about the political party in question can offer a good prediction of the overall vote share of the party in question. Third, they present a model for predicting change in vote share (i.e., voters swinging towards and away from a party), showing that Wikipedia page-view data provide an important increase…