Bayesian Reasoning Based Malicious Data Discovery on Gulf-Dialectical Arabic Tweets

One of the largest domains for written communication is the on-line domain. Today, social media has become widely used among people of different ages, groups and nationalities. In the Gulf region, Twitter is one of popular social networking sites. Tweets do not only contain information about opinions, news, and conversations, but also contain malicious content such as false information, malicious links, and other types of cyber threats. Therefore, those tweets need to be identified first in order to discover whether it is malicious or not. Tweets from the Gulf region are not written in the Modern Standard Language (MSA), which is used in most translation systems as an Arabic source. In this paper, we first present a Gulf Dialectical Arabic (Gulf DA) to English dataset in order to create a Gulf Knowledge Base (GulfKB). Then, we use the GulfKB model-based reasoning that is based on Bayesian inference to uncover malicious content and suspicious users. We have evaluated the proposed approach using numerical results. Our approach gives accuracy of 91% and outperforms the existing approaches in the state of art literature.

[1]  Guofei Gu,et al.  Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[2]  Calton Pu,et al.  Click traffic analysis of short URL spam on Twitter , 2013, 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[3]  Chris Callison-Burch,et al.  Machine Translation of Arabic Dialects , 2012, NAACL.

[4]  Justin Zhan,et al.  Uncovering Suspicious Activity From Partially Paired and Incomplete Multimodal Data , 2017, IEEE Access.

[5]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[6]  Kristofer Beck,et al.  Analyzing tweets to identify malicious messages , 2011, 2011 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY.

[7]  Sven G. Bilen,et al.  Increasing the veracity of event detection on social media networks through user trust modeling , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[8]  Chris Callison-Burch,et al.  The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content , 2011, ACL.

[9]  Jiebo Luo,et al.  SocialSpamGuard: A Data Mining-Based Spam Detection System for Social Media Networks , 2011, Proc. VLDB Endow..

[10]  Shambhu J. Upadhyaya,et al.  Analysis of Malware Propagation in Twitter , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[11]  Christopher Ré,et al.  Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[12]  Nizar Habash,et al.  Dialectal Arabic to English Machine Translation: Pivoting through Modern Standard Arabic , 2013, NAACL.

[13]  Krishna P. Gummadi,et al.  Understanding and combating link farming in the twitter social network , 2012, WWW.

[14]  Nizar Habash,et al.  Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation , 2011, EMNLP 2011.

[15]  Charles A. Kamhoua,et al.  Probabilistic Inference on Twitter Data to Discover Suspicious Users and Malicious Content , 2016, 2016 IEEE International Conference on Computer and Information Technology (CIT).

[16]  Alex Hai Wang,et al.  Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach , 2010, DBSec.

[17]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[18]  Guanhua Yan,et al.  Malware propagation in online social networks: nature, dynamics, and defense implications , 2011, ASIACCS '11.

[19]  Jong Kim,et al.  WarningBird: Detecting Suspicious URLs in Twitter Stream , 2012, NDSS.

[20]  Omer F. Rana,et al.  Real-time classification of malicious URLs on Twitter using machine activity data , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[21]  James H. Jones,et al.  Detection of Abusive Accounts with Arabic Tweets , 2022 .

[22]  Kalina Bontcheva,et al.  Classifying Tweet Level Judgements of Rumours in Social Media , 2015, EMNLP.