BiSAL - A bilingual sentiment analysis lexicon to analyze Dark Web forums for cyber security

In this paper, we present the development of a Bilingual Sentiment Analysis Lexicon (BiSAL) for cyber security domain, which consists of a Sentiment Lexicon for ENglish (SentiLEN) and a Sentiment Lexicon for ARabic (SentiLAR) that can be used to develop opinion mining and sentiment analysis systems for bilingual textual data from Dark Web forums. For SentiLEN, a list of 279 sentiment bearing English words related to cyber threats, radicalism, and conflicts are identified and a unifying process is devised to unify their sentiment scores obtained from four different sentiment data sets. Whereas, for SentiLAR, sentiment bearing Arabic words are identified from a collection of 2000 message posts from Alokab Web forum, which contains radical contents. The SentiLAR provides a list of 1019 sentiment bearing Arabic words related to cyber threats, radicalism, and conflicts along with their morphological variants and sentiment polarity. For polarity determination, a semi-automated analysis process by three Arabic language experts is performed and their ratings are aggregated using some aggregate functions. A Web interface is developed to access both the lexicons (SentiLEN and SentiLAR) of BiSAL data set online, and a beta version of the same is available at http://www.abulaish.com/bisal.

[1]  Steve Kramer,et al.  Anomaly detection in extremist web forums using a dynamical systems approach , 2010, ISI-KDD '10.

[2]  Grzegorz Kondrak,et al.  A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs , 2008, Canadian Conference on AI.

[3]  Hsinchun Chen,et al.  The Dark Web Forum Portal: From multi-lingual to video , 2011, Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics.

[4]  M. Aronoff Morphology by Itself: Stems and Inflectional Classes , 1993 .

[5]  Nicole Beebe,et al.  Ranking algorithms for digital forensic string search hits , 2014, Digit. Investig..

[6]  Muhammad Abulaish,et al.  A social graph based text mining framework for chat log investigation , 2014, Digit. Investig..

[7]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[8]  Hsinchun Chen Dark Web: Exploring and Data Mining the Dark Side of the Web , 2011 .

[9]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[10]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[11]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[12]  Mike Thelwall,et al.  Topic-based sentiment analysis for the social web: The role of mood and issue-related words , 2013, J. Assoc. Inf. Sci. Technol..

[13]  Christopher C. Yang,et al.  An analysis of user influence ranking algorithms on Dark Web forums , 2010, ISI-KDD '10.

[14]  P. J. Stone Thematic text analysis: new agendas for analyzing text content , 1997 .

[15]  Nicole Beebe,et al.  Clustering digital forensic string search output , 2014, Digit. Investig..

[16]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.