Wikipedia sockpuppetry: linking accounts to real people is pure speculation

by taha yasseri 23/04/2015

Conservative chairman Grant Shapps is accused of sockpuppetry on Wikipedia, but this former Wikipedia admin isn’t so sure the evidence stands up.

by taha yasseri 23/04/2015

Conservative Party Chairman Grant Shapps gives a speech on free trade at the Institute of Directors in London.

Reposted from The Conversation. Wikipedia has become one of the most highly linked-to websites on the internet, with countless others using it as a reference. But it can be edited by anyone, and this has led to occasions where errors have been widely repeated—or where facts have been distorted to fit an agenda. The chairman of the UK’s Conservative Party, Grant Shapps, has been accused of editing Wikipedia pages related to him and his rivals within the party. The Guardian newspaper claims Wikipedia administrators blocked an account on suspicions that it was being used by Shapps, or someone in his employ. Wikipedia accounts are anonymous, so what is the support for these claims? Is it a case of fair cop or, as Shapps says in his defence, a smear campaign in the run-up to the election? Edits examined This isn’t the first time The Guardian has directed similar accusations against Shapps around edits to Wikipedia, with similar claims emerging in September 2012. The investigation examines a list of edits by three Wikipedia user accounts: Hackneymarsh, Historyset, and Contribsx, and several other edits from users without accounts, recorded only as their IP addresses—which the article claimed to be “linked” to Shapps. The Hackneymarsh account made 12 edits in a short period in May 2010. The Historyset account made five edits in a similar period. All the edits recorded by IP addresses date to between 2008 and 2010. Most recently, the Contribsx account has been active from August 2013 to April 2015. First of all, it is technically impossible to conclusively link any of those accounts or IP addresses to a real person. Of course you can speculate—and in this case it’s clear that these accounts seem to demonstrate great sympathy with Shapps based on the edits they’ve made. But no further information about the three usernames can be made public by the Wikimedia Foundation, as per its privacy policies. However, the case is different for the IP addresses. Using GeoIP or…

Mapping, Methods, Social Data Science

Edit wars! Measuring and mapping society’s most controversial topics

by taha yasseri 03/12/2013

Although some topics are globally debated, like religion and politics, there are many topics which are controversial only in a single language edition. This reflects the local preferences and importances assigned to topics by different editorial communities.

by taha yasseri 03/12/2013

Ed: How did you construct your quantitative measure of ‘conflict’? Did you go beyond just looking at content flagged by editors as controversial? Taha: Yes we did. Actually, we have shown that controversy measures based on “controversial” flags are not inclusive at all and although they might have high precision, they have very low recall. Instead, we constructed an automated algorithm to locate and quantify the editorial wars taking place on the Wikipedia platform. Our algorithm is based on reversions, i.e. when editors undo each other’s contributions. We focused specifically on mutual reverts between pairs of editors and we assigned a maturity score to each editor, based on the total volume of their previous contributions. While counting the mutual reverts, we used more weight for those ones committed by/on editors with higher maturity scores; as a revert between two experienced editors indicates a more serious problem. We always validated our method and compared it with other methods, using human judgement on a random selection of articles. Ed: Was there any discrepancy between the content deemed controversial by your own quantitative measure, and what the editors themselves had flagged? Taha: We were able to capture all the flagged content, but not all the articles found to be controversial by our method are flagged. And when you check the editorial history of those articles, you soon realise that they are indeed controversial but for some reason have not been flagged. It’s worth mentioning that the flagging process is not very well implemented in smaller language editions of Wikipedia. Even if the controversy is detected and flagged in English Wikipedia, it might not be in the smaller language editions. Our model is of course independent of the size and editorial conventions of different language editions. Ed: Were there any differences in the way conflicts arose/were resolved in the different language versions? Taha: We found the main differences to be the topics of controversial…

Methods, Social Data Science

The physics of social science: using big data for real-time predictive modelling

by taha yasseri 21/11/2013

There are very interesting examples of using big data to make predictions about disease outbreaks, financial moves in the markets, social interactions based on human mobility patterns, election results, etc.

by taha yasseri 21/11/2013

Ed: You are interested in analysis of big data to understand human dynamics; how much work is being done in terms of real-time predictive modelling using these data? Taha: The socially generated transactional data that we call “big data” have been available only very recently; the amount of data we now produce about human activities in a year is comparable to the amount that used to be produced in decades (or centuries). And this is all due to recent advancements in ICTs. Despite the short period of availability of big data, the use of them in different sectors including academia and business has been significant. However, in many cases, the use of big data is limited to monitoring and post hoc analysis of different patterns. Predictive models have been rarely used in combination with big data. Nevertheless, there are very interesting examples of using big data to make predictions about disease outbreaks, financial moves in the markets, social interactions based on human mobility patterns, election results, etc. Ed: What were the advantages of using Wikipedia as a data source for your study—as opposed to Twitter, blogs, Facebook or traditional media, etc.? Taha: Our results have shown that the predictive power of Wikipedia page view and edit data outperforms similar box office-prediction models based on Twitter data. This can partially be explained by considering the different nature of Wikipedia compared to social media sites. Wikipedia is now the number one source of online information, and Wikipedia article page view statistics show how much Internet users have been interested in knowing about a specific movie. And the edit counts—even more importantly—indicate the level of interest of the editors in sharing their knowledge about the movies with others. Both indicators are much stronger than what you could measure on Twitter, which is mainly the reaction of the users after watching or reading about the movie. The cost of participation in Wikipedia’s editorial process…