We recently announced the start of an exciting new research project that will involve the use of topic modelling in understanding the patterns in submitted stories to the Everyday Sexism website. Here, we briefly explain our text analysis approach, “topic modelling”. At its very core, topic modelling is a technique that seeks to automatically discover the topics contained within a group of documents. ‘Documents’ in this context could refer to text items as lengthy as individual books, or as short as sentences within a paragraph. Let’s take the idea of sentences-as-documents as an example: Document 1: I like to eat kippers for breakfast. Document 2: I love all animals, but kittens are the cutest. Document 3: My kitten eats kippers too. Assuming that each sentence contains a mixture of different topics (and that a ‘topic’ can be understood as a collection of words (of any part of speech) that have different probabilities of appearance in passages discussing the topic), how does the topic modelling algorithm ‘discover’ the topics within these sentences? The algorithm is initiated by setting the number of topics that it needs to extract. Of course, it is hard to guess this number without having an insight on the topics, but one can think of this as a resolution tuning parameter. The smaller the number of topics is set, the more general the bag of words in each topic would be, and the looser the connections between them. The algorithm loops through all of the words in each document, assigning every word to one of our topics in a temporary and semi-random manner. This initial assignment is arbitrary and it is easy to show that different initialisations lead to the same results in long run. Once each word has been assigned a temporary topic, the algorithm then re-iterates through each word in each document to update the topic assignment using two criteria: 1) How prevalent is the word in question across topics? And 2) How prevalent are the…
How does the topic modelling algorithm ‘discover’ the topics within the context of everyday sexism?