Thursday 23 June 2016

Identifying A Reliable Sample of Leave/Remain Tweets

UPDATE (13 December, 2016): Try the Brexit Analyzer 

We have now made parts of the Brexit Analyzer available as a web service. You can try the topic detection by putting an example tweet here (choose mentions of political topics):

https://cloud.gate.ac.uk/shopfront/sampleServices 

A more extensive test of the outputs (also including hashtags, voting intent, @mention, and URL detection) can be tried here:

https://cloud.gate.ac.uk/shopfront/displayItem/sobigdata-brexit 

This is a web service running on GATE Cloud, where you can find many other text analytics services, available to try for free or run on large batches of data. 

We also have now a tweet collection service, should you wish to start collecting and analysing your own Brexit (or any other) tweets:

https://cloud.gate.ac.uk/shopfront/displayItem/twitter-collector 



Overview

This post is the second in the series on the Brexit Tweet Analyser.

Having looked at tweet volumes and basic characteristics of the Twitter discourse around the EU referendum, we now turn to the method we chose for identify a reliable, even if incomplete, sample of leave and remain tweets.

There is currently no ground truth available, i.e. a well known sample of Leave/Remain Twitter users, therefore it is hard to establish the accuracy of these heuristics at present, but it is something we are working on actively. 

More importantly, we are not trying to predict if leave or remain are leading, but instead we are interested in identifying a reliable, if incomplete subset, so we can analyse topics discussed and active users within.



Are Hashtags A Reliable Predictor of Leave/Remain Support?

As discussed in our earlier post, over 56% of all tweets on the referendum contain at least one hashtag. Some of these are actually indicative of support for the leave/remain campaigns, e.g. #votetoleave, #voteout, #saferin, #strongertogether. Then there are also hashtags which try to address undecided voters, e.g. #InOrOut, #undecided, while promoting either a remain or leave vote but not through explicit hashtags.

A recent study of EU referendum tweets by Ontotext, carried out over tweets in May 2016,  classified tweets as leave or remain on the basis of approximately 30 hashtags. Some of those were associated with leave, the rest -- with remain, and each  tweet was classified as leave or remain based on whether it contains predominantly leave or predominantly remain hashtags. 

Based on analysing manually a sample of random tweets with those hashtags, we found that this strategy does not always deliver a reliable assessment, since in many cases leave hashtags are used as a reference to the leave campaign, while the tweet itself is supportive of remain or neutral. The converse is also true, i.e. remain hashtags are used to refer to the remain stance/campaign. We have included some examples below. 

A more reliable, even if somewhat more restrictive, approach is to consider the last hashtag in the tweet as the most indicative of its intended stance (pro-leave or pro-remain). This results in a higher precision sample of remain/leave tweets, which we can then analyse in more depth in terms of topics discussed and opinions expressed. 

Using this approach, amongst the 1.9 million tweets between June 13th and 19th, 5.5% (106 thousand) were identified as supporting the Leave campaign, while 4% (80 thousand) - as supporting the Remain campaign. Taken together, this constitutes just under a 10% sample, which we consider sufficient for the purposes of our analysis. 

These results, albeit drawn from a smaller, high-precision sample, seem to indicate that the Leave campaign is receiving more coverage and support on Twitter, when compared to Remain. This is consistent also with the findings of the Ontotext study .

In subsequent posts we will look into the most frequently mentioned hashtags, the most active Twitter users, and the topics discussed in the Remain and Leave samples separately. 


What about #Brexit in particular?   

The recent Ontotext study on May 2016 data used #Brexit as one of the key hashtags indicative of leave. Others have also used #Brexit in the same fashion.


In our more recent 6.5 million tweets (dated between 1 June and 19 June 2016), just under 1.7 million contain the #Brexit hashtag (26%). However, having examined a random sample of those manually (see examples below), we established that while many tweets did use #Brexit to indicate support for leave, there were also many cases where #Brexit referred to the referendum, or the leave/remain question, or the Brexit campaign as a whole. We have provided some such examples at the end of this blog post. We also found a sufficient number of examples where #Brexit appears at the end of tweets while still not indicating support for voting leave. 

Therefore, we chose to distinguish the #Brexit hashtag from all other leave hashtags and tagged tweets with a final #Brexit tag separately. This enables us, in subsequent analyses, to compare findings with and without considering #Brexit.  



Example Remain/Leave Hashtag Use













Example Brexit Tweets








Thanks to:

Dominic Rout, Ian Roberts, Mark Greenwood, Diana Maynard, and the rest of the GATE Team 

Any mistakes are my own.

No comments: