Hate speech, machine classification and statistical modelling of information flows on Twitter: interpretation and communication for policy decision making

In 2013, the murder of Drummer Lee Rigby in Woolwich, UK led to an extensive public social media reaction. Given the extreme terrorist motive and public nature of the actions it was feasible that the public response could include written expressions of hateful and antagonistic sentiment towards a particular race, ethnicity and religion, which can be interpreted as ‘hate speech’. This provided motivation to study the spread of hate speech on Twitter following such a widespread and emotive event. In this paper we present a supervised machine learning text classifier, trained and tested to distinguish between hateful and/or antagonistic responses with a focus on race, ethnicity or religion; and more general responses. We used human annotated data collected from Twitter in the immediate aftermath of Lee Rigby’s murder to train and test the classifier. As “Big Data” is a growing topic of study, and its use is in policy and decision making is being constantly debated at present, we discuss the use of supervised machine learning tools to classify a sample of “Big Data”, and how the results can be interpreted for use in policy and decision making. The results of the classifier are optimal using a combination of probabilistic, rule-based and spatial based classifiers with a voted ensemble meta-classifier. We achieve an overall F-measure of 0.95 using features derived from the content of each tweet, including syntactic dependencies between terms to recognise “othering” terms, incitement to respond with antagonistic action, and claims of well founded or justified discrimination against social groups. We then demonstrate how the results of the classifier can be robustly utilized in a statistical model used to forecast the likely spread of hate speech in a sample of Twitter data.

[1]  L. Festinger,et al.  Some consequences of deindividuation in a group. , 1952, Journal of abnormal psychology.

[2]  Ma. Luisa Rodríguez Sala de Gómezgil American Journal of Sociology. Vol. 72, núm. 3, noviembre 1966 , 1967 .

[3]  A. Downs Up and Down with Ecology--The Issue Attention Cycle , 1972 .

[4]  D. Phillips,et al.  Airplane Accidents, Murder, and the Mass Media: Towards a Theory of Imitation and Suggestion , 1980 .

[5]  D. O. Sears,et al.  The Los Angeles Riots: Lessons For The Urban Future , 1994 .

[6]  Ellen Spertus,et al.  Smokey: Automatic Recognition of Hostile Messages , 1997, AAAI/IAAI.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Alan F. Smeaton,et al.  Classifying racist texts using a support vector machine , 2004, SIGIR '04.

[9]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[10]  Mumit Khan,et al.  Detecting flames and insults in text , 2008 .

[11]  J. Ahkter,et al.  Sentiment Analysis: Facebook Status Messages , 2010 .

[12]  M. Thelwall,et al.  Sentiment Strength Detection in Short Informal Text 1 , 2010 .

[13]  M. Thelwall,et al.  Data mining emotion in social network communication: Gender differences in MySpace , 2010 .

[14]  Johan Bollen,et al.  Happiness Is Assortative in Online Social Networks , 2011, Artificial Life.

[15]  D. Boyd,et al.  The Arab Spring| The Revolutions Were Tweeted: Information Flows during the 2011 Tunisian and Egyptian Revolutions , 2011 .

[16]  Joscha Legewie,et al.  Terrorist Events and Attitudes toward Immigrants: A Natural Experiment1 , 2013, American Journal of Sociology.

[17]  Sandra González-Bailón Social Science in the Era of Big Data , 2013 .

[18]  Ryan D. King,et al.  HIGH TIMES FOR HATE CRIMES: EXPLAINING THE TEMPORAL CLUSTERING OF HATE‐MOTIVATED OFFENDING , 2013 .

[19]  Wiley Interscience Journal of the American Society for Information Science and Technology , 2013 .

[20]  Kingsley Purdam,et al.  Citizen social science and citizen data? Methodological and ethical challenges for social research , 2014 .