Short Messages Spam Filtering Using Sentiment Analysis

In the same way that short instant messages are more and more used, spam and non-legitimate campaigns through this type of communication systems are growing up. Those campaigns, besides being an illegal online activity, are a direct threat to the privacy of the users. Previous short messages spam filtering techniques focus on automatic text classification and do not take message polarity into account. Focusing on phone SMS messages, this work demonstrates that it is possible to improve spam filtering in short message services using sentiment analysis techniques. Using a publicly available labelled (spam/legitimate) SMS dataset, we calculate the polarity of each message and aggregate the polarity score to the original dataset, creating new datasets. We compare the results of the best classifiers and filters over the different datasets (with and without polarity) in order to demonstrate the influence of the polarity. Experiments show that polarity score improves the SMS spam classification, on the one hand, reaching to a 98.91 % of accuracy. And on the other hand, obtaining a result of 0 false positives with 98.67 % of accuracy.

[1]  Rohit Giyanani,et al.  Spam Detection using Natural Language Processing , 2014 .

[2]  Raymond Y. K. Lau,et al.  Text mining and probabilistic language modeling for online review spam detection , 2012, TMIS.

[3]  Prateek Saxena,et al.  The curse of 140 characters: evaluating the efficacy of SMS spam detection on android , 2013, SPSM '13.

[4]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[5]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[6]  José María Gómez Hidalgo,et al.  Does Sentiment Analysis Help in Bayesian Spam Filtering? , 2016, HAIS.

[7]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[8]  Pedro Fabricio Echeverría Briones,et al.  Text Mining Aplicado a la Clasificación y Distribución Automática de Correo Electrónico y Detección de Correo SPAM , 2006 .

[9]  Aakanksha Sharaff,et al.  SMS spam filtering and thread identification using bi-level text classification and clustering techniques , 2017, J. Inf. Sci..

[10]  Akebo Yamakami,et al.  Contributions to the study of SMS spam filtering: new collection and results , 2011, DocEng '11.

[11]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[12]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[13]  Fabrício Benevenuto,et al.  Comparing and combining sentiment analysis methods , 2013, COSN '13.

[14]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[15]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[16]  R. Kishore Kumar,et al.  Comparative Study on Email Spam Classifier using Data Mining Techniques , 2012 .

[17]  Sarah Jane Delany,et al.  SMS spam filtering: Methods and data , 2012, Expert Syst. Appl..