A significant part of political deliberation now takes place on online forums and social networking sites, leading to the idea that collective action might be evolving into “connective action”. The new level of connectivity (particularly of social media) raises important questions about its role in the political process. But understanding important phenomena, such as social influence, social forces, and digital divides, requires analysis of very large social systems, which traditionally has been a challenging task in the social sciences.

In their Policy & Internet article “Understanding Popularity, Reputation, and Social Influence in the Twitter Society“, David Garcia, Pavlin Mavrodiev, Daniele Casati, and Frank Schweitzer examine popularity, reputation, and social influence on Twitter using network information on more than 40 million users. They integrate measurements of popularity, reputation, and social influence to evaluate what keeps users active, what makes them more popular, and what determines their influence in the network.

Popularity in the Twitter social network is often quantified as the number of followers of a user. That implies that it doesn’t matter why some user follows you, or how important she is, your popularity only measures the size of your audience. Reputation, on the other hand, is a more complicated concept associated with centrality. Being followed by a highly reputed user has a stronger effect on one’s reputation than being followed by someone with low reputation. Thus, the simple number of followers does not capture the recursive nature of reputation.

In their article, the authors examine the difference between popularity and reputation on the process of social influence. They find that there is a range of values in which the risk of a user becoming inactive grows with popularity and reputation. Popularity in Twitter resembles a proportional growth process that is faster in its strongly connected component, and that can be accelerated by reputation when users are already popular. They find that social influence on Twitter is mainly related to popularity rather than reputation, but that this growth of influence with popularity is sublinear. In sum, global network metrics are better predictors of inactivity and social influence, calling for analyses that go beyond local metrics like the number of followers.

We caught up with the authors to discuss their findings:

Ed.: Twitter is a convenient data source for political scientists, but they tend to get criticised for relying on something that represents only a tiny facet of political activity. But is Twitter presumably very useful as a way of uncovering more fundamental / generic patterns of networked human interaction?

David: Twitter as a data source to study human behaviour is both powerful and limited. Powerful because it allows us to quantify and analyse human behaviour at scales and resolutions that are simply impossible to reach with traditional methods, such as experiments or surveys. But also limited because not every aspect of human behaviour is captured by Twitter and using its data comes with significant methodological challenges, for example regarding sampling biases or platform changes. Our article is an example of an analysis of general patterns of popularity and influence that are captured by spreading information in Twitter, which only make sense beyond the limitations of Twitter when we frame the results with respect to theories that link our work to previous and future scientific knowledge in the social sciences.

Ed.: How often do theoretical models (i.e. describing the behaviour of a network in theory) get linked up with empirical studies (i.e. of a network like Twitter in practice) but also with qualitative studies of actual Twitter users? And is Twitter interesting enough in itself for anyone to attempt to develop an overall theoretico-empirico-qualitative theory about it?

David: The link between theoretical models and large-scale data analyses of social media is less frequent than we all wish. But the gap between disciplines seems to be narrowing in the last years, with more social scientists using online data sources and computer scientists referring better to theories and previous results in the social sciences. What seems to be quite undeveloped is an interface with qualitative methods, specially with large-scale analyses like ours.

Qualitative methods can provide what data science cannot: questions about important and relevant phenomena that then can be explained within a wider theory if validated against data. While this seems to me as a fertile ground for interdisciplinary research, I doubt that Twitter in particular should be the paragon of such combination of approaches. I advocate for starting research from the aspect of human behaviour that is the subject of study, and not from a particularly popular social media platform that happens to be used a lot today, but might not be the standard tomorrow.

Ed.: I guess I’ve seen a lot of Twitter networks in my time, but not much in the way of directed networks, i.e. showing direction of flow of content (i.e. influence, basically) — or much in the way of a time element (i.e. turning static snapshots into dynamic networks). Is that fair, or am I missing something? I imagine it would be fun to see how (e.g.) fake news or political memes propagate through a network?

David: While Twitter provides amazing volumes of data, its programming interface is notorious for the absence of two key sources: the date when follower links are created and the precise path of retweets. The reason for the general picture of snapshots over time is that researchers cannot fully trace back the history of a follower network, they can only monitor it with certain frequency to overcome the fact that links do not have a date attached.

The generally missing picture of flows of information is because when looking up a retweet, we can see the original tweet that is being retweeted, but not if the retweet is of a retweet of a friend. This way, without special access to Twitter data or alternative sources, all information flows look like stars around the original tweet, rather than propagation trees through a social network that allow the precise analysis of fake news or memes.

Ed.: Given all the work on Twitter, how well-placed do you think social scientists would be to advise a political campaign on “how to create an influential network” beyond just the obvious (Tweet well and often, and maybe hire a load of bots). i.e. are there any “general rules” about communication structure that would be practically useful to campaigning organisations?

David: When we talk about influence on Twitter, we usually talk about rather superficial behaviour, such as retweeting content or clicking on a link. This should not be mistaken as a more substantial kind of influence, the kind that makes people change their opinion or go to vote. Evaluating the real impact of Twitter influence is a bottleneck for how much social scientists can advise a political campaign. I would say that rather than providing general rules that can be applied everywhere, social scientists and computer scientists can be much more useful when advising, tracking, and optimising individual campaigns that take into account the details and idiosyncrasies of the people that might be influenced by the campaign.

Ed.: Random question: but where did “computational social science” emerge from – is it actually quite dependent on Twitter (and Wikipedia?), or are there other commonly-used datasets? And are computational social science, “big data analytics”, and (social) data science basically describing the same thing?

David: Tracing back the meaning and influence of “computational social science” could take a whole book! My impression is that the concept started few decades ago as a spin on “sociophysics”, where the term “computational” was used as in “computational model”, emphasising a focus on social science away from toy model applications from physics. Then the influential Science article by David Lazer and colleagues in 2009 defined the term as the application of digital trace datasets to test theories from the social sciences, leaving the whole computational modelling outside the frame. In that case, “computational” was used more as it is used in “computational biology”, to refer to social science with increased power and speed thanks to computer-based technologies. Later it seems to have converged back into a combination of both the modelling and the data analysis trends, as in the “Manifesto of computational social science” by Rosaria Conte and colleagues in 2012, inspired by the fact that we need computational modelling techniques from complexity science to understand what we observe in the data.

The Twitter and Wikipedia dependence of the field is just a path dependency due to the ease and open access to those datasets, and a key turning point in the field is to be able to generalise beyond those “model organisms”, as Zeynep Tufekci calls them. One can observe these fads in the latest computer science conferences, with the rising ones being Reddit and Github, or when looking at earlier research that heavily used product reviews and blog datasets. Computational social science seems to be maturing as a field, make sense out of those datasets and not just telling cool data-driven stories about one website or another. Perhaps we are beyond the peak of inflated expectations of the hype curve and the best part is yet to come.

With respect to big data and social data science, it is easy to get lost in the field of buzzwords. Big data analytics only deals with the technologies necessary to process large volumes of data, which could come from any source including social networks but also telescopes, seismographs, and any kind of sensor. These kind of techniques are only sometimes necessary in computational social science, but are far from the core of topics of the field.

Social data science is closer, but puts a stronger emphasis on problem-solving rather than testing theories from the social sciences. When using “data science” we usually try to emphasise a predictive or explorative aspect, rather than the confirmatory or generative approach of computational social science. The emphasis on theory and modelling of computational social science is the key difference here, linking back to my earlier comment about the role of computational modelling and complexity science in the field.

Ed.: Finally, how successful do you think computational social scientists will be in identifying any underlying “social patterns” — i.e. would you agree that the Internet is a “Hadron Collider” for social science? Or is society fundamentally too chaotic and unpredictable?

David: As web scientists like to highlight, the Web (not the Internet, which is the technical infrastructure connecting computers) is the largest socio-technical artefact ever produced by humanity. Rather than as a Hadron Collider, which is a tool to make experiments, I would say that the Web can be the Hubble telescope of social science: it lets us observe human behaviour at an amazing scale and resolution, not only capturing big data but also, fast, long, deep, mixed, and weird data that we never imagined before.

While I doubt that we will be able to predict society in some sort of “psychohistory” manner, I think that the Web can help us to understand much more about ourselves, including our incentives, our feelings, and our health. That can be useful knowledge to make decisions in the future and to build a better world without the need to predict everything.

Read the full article: Garcia, D., Mavrodiev, P., Casati, D., and Schweitzer, F. (2017) Understanding Popularity, Reputation, and Social Influence in the Twitter Society. Policy & Internet 9 (3) doi:10.1002/poi3.151

David Garcia was talking to blog editor David Sutcliffe.

In a world of “connective action” — what makes an influential Twitter user?