Arabic Twitter User Profiling: Application to Cyber-security

In recent years, we witnessed a rapid growth of social media networking and micro-blogging sites such as Twitter. In these sites, users provide a variety of data such as their personal data, interests, and opinions. However, this data shared is not always true. Often, social media users hide behind a fake profile and may use it to spread rumors or threaten others. To address that, different methods and techniques were proposed for user profiling. In this article, we use machine learning for user profiling in order to predict the age and gender of a user’s profile and we assess whether it is a dangerous profile using the users’ tweets and features. Our approach uses several stylistic features such as characters based, words based and syntax based. Moreover, the topics of interest of a user are included in the profiling task. We obtained the best accuracy levels with SVM and these were respectively 73.49% for age, 83.7% for gender, and 88.7% for the dangerous profile detection.

[1]  Ivandré Paraboni,et al.  Author Profiling from Facebook Corpora , 2018, LREC.

[2]  Anat Rachel Shimoni,et al.  Gender, genre, and writing style in formal written texts , 2003 .

[3]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[4]  Mondal,et al.  A Comparative study of Classifiers Performance forGender Classification , 2014 .

[5]  Christian Viard-Gaudin,et al.  Automatic writer identification framework for online handwritten documents using character prototypes , 2009, Pattern Recognit..

[6]  Moshe Koppel,et al.  Determining an author's native language by mining a text for errors , 2005, KDD '05.

[7]  Georgios Kambourakis,et al.  Anonymity and closely related terms in the cyberspace: An analysis by example , 2014, J. Inf. Secur. Appl..

[8]  Shlomo Argamon,et al.  Automatically Categorizing Written Texts by Author Gender , 2002, Lit. Linguistic Comput..

[9]  Marcelo Luis Errecalde,et al.  Profile-based Approach for Age and Gender Identification , 2016, CLEF.

[10]  Thamar Solorio,et al.  A Simple Approach to Author Profiling in MapReduce , 2014, CLEF.

[11]  Patrick Juola,et al.  Large-Scale Experiments in Authorship Attribution , 2012 .

[12]  Mahmoud Al-Ayyoub,et al.  On authorship authentication of Arabic articles , 2014, 2014 5th International Conference on Information and Communication Systems (ICICS).

[13]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[14]  Matthias Hagen,et al.  Overview of the Author Obfuscation Task at PAN 2017: Safety Evaluation Revisited , 2017, CLEF.

[15]  Graça Bressan,et al.  Age Groups Classification in Social Network Using Deep Learning , 2017, IEEE Access.

[16]  Ismail Kassou,et al.  Authorship Analysis Studies: A Survey , 2014 .

[17]  Rim Faiz,et al.  Author Profiling: Age Prediction Based on Advanced Bayesian Networks , 2016, Res. Comput. Sci..

[18]  Matthias Hagen,et al.  Overview of the Author Obfuscation Task at PAN 2018: A New Approach to Measuring Safety , 2018, CLEF.

[19]  Jie Tang,et al.  A Combination Approach to Web User Profiling , 2010, TKDD.

[20]  Walter Daelemans,et al.  Predicting age and gender in online social networks , 2011, SMUC '11.

[21]  George K. Mikros Authorship Attribution and Gender Identification in Greek Blogs , 2013 .

[22]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[23]  Dominique Estival,et al.  TAT: An Author Profiling Tool with Application to Arabic Emails , 2007, ALTA.