Opinion spam detection framework using hybrid classification scheme

With the advent of social networking sites, opinion-mining applications have attracted the interest of the online community on review sites to know about products for their purchase decisions. However, due to increasing trend of posting spam (fake) reviews to promote the target products or defame the specific brands of competitors, Opinion Spam detection and classification has emerged as a hot issue in the community of opinion mining and sentiment analysis. We investigate the issue of Opinion Spam detection by using different combinations of entities, features, and their sentiment scores. We enrich the feature set of a baseline Spam detection method with Spam detection features (Opinion Spam, Opinion Spammer, Item Spam). Using a dataset of reviews from the Amazon site and sentences labeled for Spam detection, we evaluate the role of spamicity-related features in detecting and classifying spam (fake) clues and distinguishing them from genuine reviews. For this purpose, we introduce a rule-based feature weighting scheme and propose a method for tagging the review sentence as spam and non-spam. Experiments results depict that spam-related features improve Spam detection in review sentences posted on product review sites. Adding a revised feature weighting scheme achieved an accuracy increase from 93 to 96%. Furthermore, a hybrid set of features are shown to improve the performance of Opinion Spam detection in terms of better precision, recall, and F -measure values. This work shows that combining spam-related features with rule-based weighting scheme can improve the performance of even baseline Spam detection method. This improvement can be of use to Opinion Spam detection systems, due to the growing interest of individuals and companies in isolating fake (spam) and genuine (non-spam) reviews about products. The outcome of this work will provide an insight into spam-related features and feature weighting and will assist in developing more advanced applications for Opinion Spam detection. In the field of Opinion Spam detection, previous state-of-the-art studies used less number of spamicity-related features and less efficient feature weighting scheme. However, we provided a revised feature selection and a revised feature weighting scheme with normalized spamicity score computation technique. Therefore, our contribution is novel to the field because it provides a significant improvement over the comparing methods.

[1]  Siddu P. Algur,et al.  Rating consistency and review content based multiple stores review spam detection , 2015, 2015 International Conference on Information Processing (ICIP).

[2]  Abhinav Kumar,et al.  Spotting opinion spammers using behavioral footprints , 2013, KDD.

[3]  Mykhailo Granik,et al.  Fake news detection using naive Bayes classifier , 2017, 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON).

[4]  Xun Liang,et al.  Discerning the Trend: Concealing Deceptive Reviews , 2015, 2015 IEEE International Conference on Systems, Man, and Cybernetics.

[5]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[6]  Jure Leskovec,et al.  Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[7]  Muhammad Zubair Asghar,et al.  A Unified Framework for Creating Domain Dependent Polarity Lexicons from User Generated Reviews , 2015, PloS one.

[8]  Arjun Mukherjee,et al.  Exploiting Burstiness in Reviews for Review Spammer Detection , 2021, ICWSM.

[9]  Claire Cardie,et al.  Towards a General Rule for Identifying Deceptive Opinion Spam , 2014, ACL.

[10]  Derek Greene,et al.  Distortion as a validation criterion in the identification of suspicious reviews , 2010, SOMA '10.

[11]  King-Ip Lin,et al.  Review spam detector with rating consistency check , 2013, ACMSE '13.

[12]  Junhui Wang,et al.  Detecting group review spam , 2011, WWW.

[13]  R Karthiga,et al.  Sentiment Classification based on Latent Dirichlet Allocation , 2015 .

[14]  Bing Xu,et al.  An unsupervised approach to rank product reviews , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[15]  Kim-Kwang Raymond Choo,et al.  Revisiting Semi-Supervised Learning for Online Deceptive Review Detection , 2017, IEEE Access.

[16]  Muhammad Zubair Asghar,et al.  Lexicon-enhanced sentiment analysis framework using rule-based classification scheme , 2017, PloS one.

[17]  Ee-Peng Lim,et al.  Finding unusual review patterns using unexpected rules , 2010, CIKM.

[18]  Alaa El-Halees,et al.  An approach for detecting spam in arabic opinion reviews , 2015, Int. Arab J. Inf. Technol..

[19]  Siddu P. Algur,et al.  Review spamicity based on rank and content of the review , 2015, 2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT).

[20]  Taghi M. Khoshgoftaar,et al.  Survey of review spam detection using machine learning techniques , 2015, Journal of Big Data.

[21]  Eric Gilbert,et al.  Understanding deja reviewers , 2010, CSCW '10.

[22]  Wolfgang Nejdl,et al.  MailRank: using ranking for spam detection , 2005, CIKM '05.

[23]  Muhammad Zubair Asghar,et al.  SentiHealth: creating health-related sentiment lexicon using hybrid approach , 2016, SpringerPlus.

[24]  Arkaitz Zubiaga,et al.  Detection and Resolution of Rumours in Social Media , 2017, ACM Comput. Surv..

[25]  Philip S. Yu,et al.  Identify Online Store Review Spammers via Social Review Graph , 2012, TIST.

[26]  Shakeel Ahmad,et al.  T‐SAF: Twitter sentiment analysis framework using a hybrid classification scheme , 2018, Expert Syst. J. Knowl. Eng..

[27]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[28]  Elena Lloret,et al.  Experiments on Summary-based Opinion Classification , 2010, HLT-NAACL 2010.

[29]  Claire Cardie,et al.  Negative Deceptive Opinion Spam , 2013, NAACL.

[30]  Yejin Choi,et al.  Distributional Footprints of Deceptive Product Reviews , 2012, ICWSM.

[31]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[32]  Luca Becchetti,et al.  Link analysis for Web spam detection , 2008, TWEB.

[33]  Malay Bhatt,et al.  Detection and Summarization of Genuine Review using Visual Data Mining , 2012 .

[34]  Arjun Mukherjee,et al.  What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[35]  Chengai Sun,et al.  Exploiting Product Related Review Features for Fake Review Detection , 2016 .

[36]  Hsin-Hsi Chen,et al.  Opinion Spam Detection in Web Forum: A Real Case Study , 2015, WWW.

[37]  Cristina Radulescu,et al.  Identification of spam comments using natural language processing techniques , 2014, 2014 IEEE 10th International Conference on Intelligent Computer Communication and Processing (ICCP).

[38]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[39]  Masrah Azrifah Azmi Murad,et al.  Detecting deceptive reviews using lexical and syntactic features , 2013, 2013 13th International Conference on Intellient Systems Design and Applications.

[40]  Philip S. Yu,et al.  Review Graph Based Online Store Review Spammer Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[41]  Luyang Li,et al.  Document representation and feature combination for deceptive spam review detection , 2017, Neurocomputing.

[42]  Philip S. Yu,et al.  Review spam detection via temporal pattern discovery , 2012, KDD.

[43]  K. Umamaheswari,et al.  Hybrid approach of improved binary particle swarm optimization and shuffled frog leaping for feature selection , 2018, Comput. Electr. Eng..

[44]  Paolo Rosso,et al.  Using PU-Learning to Detect Deceptive Opinion Spam , 2013, WASSA@NAACL-HLT.