Recital of supervised learning on review spam detection: An empirical analysis

Online purchasing became an integral part of our lives in this digital era where E-commerce websites allow people to buy as well as share their experiences about products or services in the form of reviews. Customers as well as companies use these reviews for decision making. This facility helps people to derive their buying decisions whereas malicious users use this as their tool to promote or demote products or services intentionally. This phenomenon is called review spam. Review spam detection is the classification of reviews into malign or benign. Therefore, our aim is to evaluate performance of supervised machine learning algorithms for review spam detection based on different feature sets extracted from real life dataset instead of Amazon Mechanical Turkers (AMT) tailored dataset. We study various factors including Recall, Precision, and Receiver Operating Characteristic (ROC) through experimentation. AdaBoost outperforms all others with 0.83 precision and has correctly identified all spams whereas misclassified minuscule number of normal reviews.

[1]  Claire Cardie,et al.  Towards a General Rule for Identifying Deceptive Opinion Spam , 2014, ACL.

[2]  Ee-Peng Lim,et al.  Finding unusual review patterns using unexpected rules , 2010, CIKM.

[3]  Bing Liu,et al.  Spotting Fake Reviews via Collective Positive-Unlabeled Learning , 2014, 2014 IEEE International Conference on Data Mining.

[4]  Bing Liu,et al.  Analyzing and Detecting Review Spam , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[6]  A. Chua,et al.  A Linguistic Framework to Distinguish between Genuine and Deceptive Online Reviews , 2014 .

[7]  Philip S. Yu,et al.  Review spam detection via temporal pattern discovery , 2012, KDD.

[8]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[9]  Arjun Mukherjee,et al.  Spotting Fake Reviews using Positive-Unlabeled Learning , 2014, Computación y Sistemas.

[10]  Claire Cardie,et al.  TopicSpam: a Topic-Model based approach for spam detection , 2013, ACL.

[11]  Hai Zhao,et al.  Deceptive Opinion Spam Detection Using Deep Level Linguistic Features , 2015, NLPCC.

[12]  Tianrui Li,et al.  A Combined-Learning Based Framework for Improved Software Fault Prediction , 2017, Int. J. Comput. Intell. Syst..

[13]  Paolo Rosso,et al.  Detection of Opinion Spam with Character n-grams , 2015, CICLing.

[14]  Yuefeng Li,et al.  Aspect-Based Opinion Extraction from Customer reviews , 2014, CSE 2014.

[15]  Arjun Mukherjee,et al.  What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[16]  Claire Cardie,et al.  Negative Deceptive Opinion Spam , 2013, NAACL.

[17]  Michael Luca Reviews, Reputation, and Revenue: The Case of Yelp.Com , 2016 .

[18]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[19]  Tao Wang,et al.  Voting for Deceptive Opinion Spam Detection , 2014, ArXiv.

[20]  Arjun Mukherjee,et al.  Fake Review Detection: Classification and Analysis of Real and Pseudo Reviews , 2013 .