A graph proximity feature augmentation approach for identifying accounts of terrorists on twitter

Abstract With the popularity of social networks, terrorist groups such as ISIS encouraged others to follow their activities, share their ideas, recruit fans, radicalize communities, and raise funds to support future attacks. This has led to the emergence of radicalized online accounts that belong to terrorists or their fans. Existing techniques for counter-terrorism investigations which aim to suspend such accounts are based on reports by users or syntactic-based sentiment analysis techniques, which are not accurate on short texts shared by terrorist such as tweets. This work proposed a feature augmentation approach to enrich the content of tweets before investigating them to discover the radicalized online contents. The augmented tweets are then used to classify accounts into Pro-ISIS or Anti-ISIS categories. We utilized topic modeling as a baseline method for feature augmentation. We studied the effects of utilizing tweets at different time intervals on the quality of the generated models that classify tweets and the corresponding accounts. We then introduced a novel feature augmentation approach that utilizes Neighborhood Overlap, a graph proximity technique that discovers terms having a strong relationship with the Pro-ISIS category. Terms extracted from tweets are represented as nodes in a graph, which is then partitioned into clusters containing different terms. Terms in strongly connected parts of each cluster are augmented to the original term vectors of the tweets based on the similarity between those terms and each tweet. We compared our approach with other baseline augmentation techniques such Term-to-Term correlation, Topic Modeling, and other existing techniques. Experimental results on a dataset containing Pro- and Anti-ISIS tweets show that our approach is quite promising to automate the identification of terrorist contents online. The results have shown that using graph proximity measures such as Neighborhood Overlap for term augmentation gains higher Precision, Recall, and F-measure than the typical approaches. In addition, we found that applying time-based analysis with term augmentation to identify radicalized accounts enhanced the Precision of the investigation process.

[1]  Mirella M. Moro,et al.  Analyzing the Strength of Co-authorship Ties with Neighborhood Overlap , 2015, DEXA.

[2]  Gabriele Oliva,et al.  Network Defensive Strategy Definition Based on Node Criticality , 2019, 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC).

[3]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[4]  Graeme Horsman A survey of current social network and online communication provision policies to support law enforcement identify offenders , 2017, Digit. Investig..

[5]  Xia Liu,et al.  A big data approach to examining social bots on Twitter , 2019, Journal of Services Marketing.

[6]  Harith Alani,et al.  Contextual Semantics for Radicalisation Detection on Twitter , 2018, SW4SG@ISWC.

[7]  Mona T. Diab,et al.  Rumor Detection and Classification for Twitter Data , 2015, ArXiv.

[8]  Haixun Wang,et al.  Short text understanding through lexical-semantic analysis , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[9]  Anna Visvizi,et al.  The influence of external political events on social networks: the case of the Brexit Twitter Network , 2019, J. Ambient Intell. Humaniz. Comput..

[10]  Andrew Hoskins,et al.  Analyzing the semantic content and persuasive composition of extremist media: A case study of texts produced during the Gaza conflict , 2011, Inf. Syst. Frontiers.

[11]  Qing Yang,et al.  Discovering User Interest on Twitter with a Modified Author-Topic Model , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[12]  York Hagmayer,et al.  Transitive reasoning distorts induction in causal chains , 2016, Memory & cognition.

[13]  Michael Goldsmith,et al.  Understanding the Radical Mind: Identifying Signals to Detect Extremist Content on Twitter , 2019, 2019 IEEE International Conference on Intelligence and Security Informatics (ISI).

[14]  Li Wang,et al.  How Noisy Social Media Text, How Diffrnt Social Media Sources? , 2013, IJCNLP.

[15]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[16]  Padhraic Smyth,et al.  Analyzing Entities and Topics in News Articles Using Statistical Topic Models , 2006, ISI.

[17]  Qiaozhu Mei,et al.  One theme in all views: modeling consensus topics in multiple contexts , 2013, KDD.

[18]  Feida Zhu,et al.  It Is Not Just What We Say, But How We Say Them: LDA-based Behavior-Topic Model , 2013, SDM.

[19]  Stuart German,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1988 .

[20]  Danah Boyd,et al.  Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[21]  Omar Boussaïd,et al.  Sentiment Analysis of Twitter Messages using Word2vec by Weighted Average , 2019, 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS).

[22]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[23]  Aryya Gangopadhyay,et al.  Multimode co-clustering for analyzing terrorist networks , 2016, Information Systems Frontiers.

[24]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[25]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[26]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.

[27]  H. Raghav Rao,et al.  Information control and terrorism: Tracking the Mumbai terrorist attack through twitter , 2011, Inf. Syst. Frontiers.

[28]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[29]  Arunabha Sen,et al.  A Novel Graph Analytic Approach to Monitor Terrorist Networks , 2018, 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom).

[30]  Marc Cheong,et al.  A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter , 2011, Inf. Syst. Frontiers.

[31]  Mohammad Fahim Abrar,et al.  A Framework for Analyzing Real-Time Tweets to Detect Terrorist Activities , 2019, 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE).

[32]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[33]  Abdelouhab Aloui,et al.  Detection of terrorist threats on Twitter using SVM , 2019, ICFNDS.

[34]  Ralucca Gera,et al.  Twitter Response to Munich July 2016 Attack: Network Analysis of Influence , 2019, Front. Big Data.

[35]  Graeme Horsman,et al.  Identifying offenders on Twitter: A law enforcement practitioner guide , 2017, Digit. Investig..

[36]  Scott Sanner,et al.  Improving LDA topic models for microblogs via tweet pooling and automatic labeling , 2013, SIGIR.

[37]  Lan Du,et al.  Nonparametric Bayesian Topic Modelling with the Hierarchical Pitman-Yor Processes , 2016, Int. J. Approx. Reason..

[38]  Hsinchun Chen,et al.  Introduction to special issue on terrorism informatics , 2011, Inf. Syst. Frontiers.

[39]  Hong Cheng,et al.  The dual-sparse topic model: mining focused topics and focused terms in short text , 2014, WWW.

[40]  Amit P. Sheth,et al.  Predictive Analysis on Twitter: Techniques and Applications , 2018 .

[41]  Luis Alfonso Ureña López,et al.  Sentiment analysis in Twitter , 2012, Natural Language Engineering.

[42]  Humaira Arshad,et al.  Evidence collection and forensics on social networks: Research challenges and directions , 2019, Digit. Investig..

[43]  Anura P. Jayasumana,et al.  Detecting radicalization trajectories using graph pattern matching algorithms , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[44]  Jianyong Wang,et al.  A dirichlet multinomial mixture model-based approach for short text clustering , 2014, KDD.

[45]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[46]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[47]  Tobun Dorbin Ng,et al.  Terrorism and Crime Related Weblog Social Network: Link, Content Analysis and Information Visualization , 2007, 2007 IEEE Intelligence and Security Informatics.

[48]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[49]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[50]  Fredrik Johansson,et al.  Harvesting and analysis of weak signals for detecting lone wolf terrorists , 2012, 2012 European Intelligence and Security Informatics Conference.