Short Messages Spam Filtering Combining Personality Recognition and Sentiment Analysis

Currently, short communication channels are growing up due to the huge increase in the number of smartphones and online social networks users. This growth attracts malicious campaigns, such as spam campaigns, that are a direct threat to the security and privacy of the users. While most researches are focused on automatic text classification, in this work we demonstrate the possibility of improving current short messages spam detection systems using a novel method. We combine personality recognition and sentiment analysis techniques to analyze Short Message Services (SMS) texts. We enrich a publicly available dataset adding these features, first separately and after in combination, of each message to the dataset, creating new datasets. We apply several combinations of the best SMS spam classifiers and filters to each dataset in order to compare the results of each one. Taking into account the experimental results we analyze the real inuence of each feature and the combination of both. At the end, the best ...

[1]  Myers,et al.  Gifts Differing: Understanding Personality Type , 1980 .

[2]  Marjorie T. Davis,et al.  Personality and the teaching of composition , 1989 .

[3]  Lynn Quitman Troyka,et al.  Personality and the Teaching of Composition , 1991 .

[4]  P. Costa,et al.  Normal Personality Assessment in Clinical Practice: The NEO Personality Inventory. , 1992 .

[5]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[6]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[7]  Pedro Fabricio Echeverría Briones,et al.  Text Mining Aplicado a la Clasificación y Distribución Automática de Correo Electrónico y Detección de Correo SPAM , 2006 .

[8]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[9]  Jon Oberlander,et al.  Whose Thumb Is It Anyway? Classifying Author Personality from Weblog Text , 2006, ACL.

[10]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[11]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[12]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[13]  Deokjai Choi,et al.  Independent and Personal SMS Spam Filtering , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.

[14]  Akebo Yamakami,et al.  Contributions to the study of SMS spam filtering: new collection and results , 2011, DocEng '11.

[15]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[16]  Tingshao Zhu,et al.  Big-Five Personality Prediction Based on User Behaviors at Social Network Sites , 2012, ArXiv.

[17]  Sarah Jane Delany,et al.  SMS spam filtering: Methods and data , 2012, Expert Syst. Appl..

[18]  Raymond Y. K. Lau,et al.  Text mining and probabilistic language modeling for online review spam detection , 2012, TMIS.

[19]  Prateek Saxena,et al.  The curse of 140 characters: evaluating the efficacy of SMS spam detection on android , 2013, SPSM '13.

[20]  Fabrício Benevenuto,et al.  Comparing and combining sentiment analysis methods , 2013, COSN '13.

[21]  Oliver Brdiczka,et al.  Understanding Email Writers: Personality Prediction from Email Messages , 2013, UMAP.

[22]  Fabio Celli,et al.  The Effect of Personality Type on Deceptive Communication Style , 2013, 2013 European Intelligence and Security Informatics Conference.

[23]  Rohit Giyanani,et al.  Spam Detection using Natural Language Processing , 2014 .

[24]  Fabio Celli,et al.  PR2: A Language Independent Unsupervised Tool for Personality Recognition from Text , 2014, ArXiv.

[25]  Alessandro Vinciarelli,et al.  A Survey of Personality Computing , 2014, IEEE Transactions on Affective Computing.

[26]  Benno Stein,et al.  Overview of the 3rd Author Profiling Task at PAN 2015 , 2015, CLEF.

[27]  Paolo Rosso,et al.  Detecting positive and negative deceptive opinions using PU-learning , 2015, Inf. Process. Manag..

[28]  José María Gómez Hidalgo,et al.  Using Personality Recognition Techniques to Improve Bayesian Spam Filtering , 2016, Proces. del Leng. Natural.

[29]  José María Gómez Hidalgo,et al.  Novel Comment Spam Filtering Method on Youtube: Sentiment Analysis and Personality Recognition , 2017, ICWE Workshops.

[30]  Aakanksha Sharaff,et al.  SMS spam filtering and thread identification using bi-level text classification and clustering techniques , 2017, J. Inf. Sci..