Beating the bookmakers: leveraging statistics and Twitter microposts for predicting soccer results

In this paper, we investigate the feasibility of using collective knowledge for predicting the winner of a soccer game. Specifically, we developed different methods that extract and aggregate the information contained in over 50 million Twitter microposts to predict the outcome of soccer games, considering methods that use the Twitter volume, the sentiment towards teams and the score predictions made by Twitter users. Apart from collective knowledge-based prediction methods, we also implemented traditional statistical methods. Our results show that the combination of different types of methods using both statistical knowledge and large sources of collective knowledge can beat both expert and bookmaker predictions. Indeed, we were for instance able to realize a monetary profit of almost 30% when betting on soccer games of the second half of the English Premier League 2013-2014.