A comparison of language-dependent and language-independent models for violence prediction

The nature of violence changes with developments in politics, religion, and technology. This poses challenges to governmental and non-governmental organizations responsible for bringing forth timely strategies for engaging with groups advocating violence. While some groups are well-known for their violence, other groups' characteristics vary throughout time and across regions, which hampers traditional decision-making processes, taking time and resources that organizations do not always have. As such, a scalable and effective method for identifying violent groups becomes imperative. This paper applies text analysis techniques to differentiate violent and non-violent groups using English text from various value-based groups. The models presented in this paper achieved accuracies of at least 71% and as high as 83%. The results demonstrate that text analysis provides a powerful predictive solution for winnowing the violent groups from the non-violent ones. In addition, by incorporating natural language processing tools, language-dependent models show a slight 2% improvement in accuracy over language-independent models. The overall similar performance of language-dependent and language-independent models suggests the two approaches are comparable alternatives to each other.

[1]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[2]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[3]  Donald E. Brown,et al.  Predicting the tolerance level of religious discourse through computational linguistics , 2016, 2016 IEEE Systems and Information Engineering Design Symposium (SIEDS).

[4]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[5]  Sabine Ploux,et al.  Using Topic Salience and Connotational Drifts to Detect Candidates to Semantic Change , 2011, IWCS.

[6]  Markus Strohmaier,et al.  Analyzing human intentions in natural language text , 2009, K-CAP '09.

[7]  Donald E. Brown,et al.  Computational analysis of religious and ideological linguistic behavior , 2017, 2017 Systems and Information Engineering Design Symposium (SIEDS).

[8]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[9]  Peter Willett,et al.  The Porter stemming algorithm: then and now , 2006, Program.

[10]  Erik Velldal,et al.  Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants , 2017, EMNLP.

[11]  Kimberly Glasgow,et al.  Assessing Violence Risk in Threatening Communications , 2014, CLPsych@ACL.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  J. McCarthy,et al.  Analyzing the Religious War of Words over Climate Change , 2016 .

[14]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[15]  Donald E. Brown,et al.  Hyperparameter Optimization for Predicting the Tolerance Level of Religious Discourse , 2017, SBP-BRiMS.

[16]  Stephen C. P. Wong,et al.  The efficacy of violence prediction: a meta-analytic comparison of nine risk assessment tools. , 2010, Psychological bulletin.

[17]  Vimla L. Patel,et al.  Exploring dangerous neighborhoods: Latent Semantic Analysis and computing beyond the bounds of the familiar , 2005, AMIA.