Spam review detection using spiral cuckoo search clustering method

Nowadays, online reviews play an important role in customer’s decision. Starting from buying a shirt from an e-commerce site to dining in a restaurant, online reviews has become a basis of selection. However, peoples are always in a hustle and bustle since they don’t have time to pay attention to the intrinsic details of products and services, thus the dependency on online reviews have been hiked. Due to reliance on online reviews, some people and organizations pompously generate spam reviews in order to promote or demote the reputation of a person/product/organization. Thus, it is impossible to identify whether a review is a spam or a ham by the naked eye and it is also impractical to classify all the reviews manually. Therefore, a spiral cuckoo search based clustering method has been introduced to discover spam reviews. The proposed method uses the strength of cuckoo search and Fermat spiral to resolve the convergence issue of cuckoo search method. The efficiency of the proposed method has been tested on four spam datasets and one Twitter spammer dataset. To validate the efficacy of proposed clustering method it is compared with six metaheuristics clustering methods namely; particle swarm optimization, differential evolution, genetic algorithm, cuckoo search, K-means, and improved cuckoo search. The experimental results and statistical analysis validate that the proposed method outruns the existing methods.

[1]  Peng Yang,et al.  Deceptive Review Spam Detection via Exploiting Task Relatedness and Unlabeled Data , 2016, EMNLP.

[2]  Masrah Azrifah Azmi Murad,et al.  Detecting deceptive reviews using lexical and syntactic features , 2013, 2013 13th International Conference on Intellient Systems Design and Applications.

[3]  Maria Petrescu,et al.  Incentivized reviews: Promising the moon for a few stars , 2017 .

[4]  Avinash Chandra Pandey,et al.  Spam Detection Using Rating and Review Processing Method , 2018, Smart Innovations in Communication and Computational Sciences.

[5]  Lisa Singh,et al.  Detecting Users Who Share Extremist Content on Twitter , 2018 .

[6]  Shigang Liu,et al.  A comparative study of the class imbalance problem in Twitter spam detection , 2018, Concurr. Comput. Pract. Exp..

[7]  Mengjie Zhang,et al.  Improving performance for classification with incomplete data using wrapper-based feature selection , 2016, Evol. Intell..

[8]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[9]  Georg Lackermair,et al.  Importance of Online Product Reviews from a Consumer's Perspective , 2013 .

[10]  Arjun Mukherjee,et al.  What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[11]  Luigi Barone,et al.  On XCSR for electronic fraud detection , 2012, Evol. Intell..

[12]  Edward B. Roessler,et al.  Introduction to Probability and Statistics , 1961, The Mathematical Gazette.

[13]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[14]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[15]  Majdi M. Mafarja,et al.  Hybrid Whale Optimization Algorithm with simulated annealing for feature selection , 2017, Neurocomputing.

[16]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[17]  Rong Huang,et al.  Web spam classification method based on deep belief networks , 2018, Expert Syst. Appl..

[18]  Jitendra Kumar Rout,et al.  Review Spam Detection Using Semi-supervised Technique , 2018 .

[19]  Don-Lin Yang,et al.  An efficient Fuzzy C-Means clustering algorithm , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[20]  Yue Lu,et al.  Latent aspect rating analysis on review text data: a rating regression approach , 2010, KDD.

[21]  Ali Selamat,et al.  Hybrid email spam detection model with negative selection algorithm and differential evolution , 2014, Eng. Appl. Artif. Intell..

[22]  Harish Sharma,et al.  Leukocyte segmentation in tissue images using differential evolution algorithm , 2013, Swarm Evol. Comput..

[23]  P. Santhi Thilagam,et al.  Discovering spammer communities in twitter , 2017, Journal of Intelligent Information Systems.

[24]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[25]  Prabhat Kumar,et al.  Spam Review Detection Using Ensemble Machine Learning , 2018, MLDM.

[26]  Ashutosh Kumar Singh,et al.  Web-Spam Features Selection Using CFS-PSO , 2018 .

[27]  Naomie Salim,et al.  Detection of fake opinions using time series , 2016, Expert Syst. Appl..

[28]  Yi Yang,et al.  Learning to Identify Review Spam , 2011, IJCAI.

[29]  Ali Selamat,et al.  Enhanced genetic algorithm for spam detection in email , 2011, 2011 IEEE 2nd International Conference on Software Engineering and Service Science.

[30]  Jun Zhang,et al.  Statistical Detection of Online Drifting Twitter Spam: Invited Paper , 2016, AsiaCCS.

[31]  K. alik An efficient k'-means clustering algorithm , 2008 .

[32]  Ioannis Korkontzelos,et al.  Detection of spam-posting accounts on Twitter , 2018, Neurocomputing.

[33]  Avinash Chandra Pandey,et al.  Data clustering using hybrid improved cuckoo search method , 2016, 2016 Ninth International Conference on Contemporary Computing (IC3).

[34]  Yu Wang,et al.  Statistical Features-Based Real-Time Detection of Drifted Twitter Spam , 2017, IEEE Transactions on Information Forensics and Security.

[35]  Raju Pal,et al.  Unsupervised data classification using improved biogeography based optimization , 2018, Int. J. Syst. Assur. Eng. Manag..

[36]  Mangesh Bedekar,et al.  Intelligent Twitter Spam Detection: A Hybrid Approach , 2018 .

[37]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[38]  Raju Pal,et al.  Unsupervised data classification using modified cuckoo search method , 2016, 2016 Ninth International Conference on Contemporary Computing (IC3).

[39]  Oliver Kramer,et al.  Efficient recurrent local search strategies for semi- and unsupervised regularized least-squares classification , 2012, Evol. Intell..

[40]  Wanlei Zhou,et al.  Twitter spam detection: Survey of new approaches and comparative study , 2017, Comput. Secur..

[41]  Ngoc Thanh Nguyen,et al.  A combined negative selection algorithm-particle swarm optimization for an email spam detection system , 2015, Eng. Appl. Artif. Intell..

[42]  Paolo Rosso,et al.  Deception Detection and Opinion Spam , 2017 .

[43]  Xifeng Yan,et al.  Synthetic review spamming and defense , 2013, WWW.

[44]  Muhammad Arshad Islam,et al.  A hybrid approach for spam detection for Twitter , 2017, 2017 14th International Bhurban Conference on Applied Sciences and Technology (IBCAST).

[45]  Dan Simon,et al.  Biogeography-Based Optimization , 2022 .

[46]  Avinash Chandra Pandey,et al.  Hybrid step size based cuckoo search , 2017, 2017 Tenth International Conference on Contemporary Computing (IC3).

[47]  Li-Chen Cheng,et al.  Case Study of Fake Web Reviews , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[48]  Francisco B. Pereira,et al.  A study on diversity for cluster geometry optimization , 2009, Evol. Intell..

[49]  Wei Hu,et al.  Twitter spammer detection using data stream clustering , 2014, Inf. Sci..

[50]  Avinash Chandra Pandey,et al.  Twitter sentiment analysis using hybrid cuckoo search method , 2017, Inf. Process. Manag..

[51]  nbspPreeti Nakum,et al.  Survey on review SPAM detection , 2016 .

[52]  Abdolreza Hatamlou,et al.  Black hole: A new heuristic optimization approach for data clustering , 2013, Inf. Sci..

[53]  Taghi M. Khoshgoftaar,et al.  Improving detection of untrustworthy online reviews using ensemble learners combined with feature selection , 2017, Social Network Analysis and Mining.

[54]  Shalini Batra,et al.  Ensemble based spam detection in social IoT using probabilistic data structures , 2018, Future Gener. Comput. Syst..

[55]  Ilya Pavlyukevich Lévy flights, non-local search and simulated annealing , 2007, J. Comput. Phys..

[56]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[57]  Ee-Peng Lim,et al.  Finding unusual review patterns using unexpected rules , 2010, CIKM.

[58]  Yen-Liang Chen,et al.  Opinion mining from online hotel reviews - A text summarization approach , 2017, Inf. Process. Manag..

[59]  Jitendra Kumar Rout,et al.  Review Spam Detection Using Opinion Mining , 2018 .

[60]  Michael Luca,et al.  Fake It Till You Make It: Reputation, Competition, and Yelp Review Fraud , 2015 .

[61]  Surendra Sedhai,et al.  Semi-Supervised Spam Detection in Twitter Stream , 2017, IEEE Transactions on Computational Social Systems.

[62]  Bo Pang,et al.  A unified framework for detecting author spamicity by modeling review deviation , 2018, Expert Syst. Appl..

[63]  Sapna Sinha,et al.  Model for Detecting Fake or Spam Reviews , 2018 .

[64]  Andreas Munzel Assisting consumers in detecting fake reviews: The role of identity information disclosure and consensus , 2016 .