Review spam detection using active learning

As the access to Internet has been so much easier in the last decade or so, people are using online applications more than ever. Online marketing, in fact, the whole e-commerce is getting enormous day by day if not in every minute. Online Reviews play a very important role in this field and proving itself to be auspicious in terms of decision making from a customer's point of view. Even though these are very sensitive and significant information, ensuring the authenticity of user-generated content (Reviews, forums, blogs, discussion groups etc.) is erratically visible. That is why spamming, fake reviews and fabricated opinions are on the rise. Materially, it has become a profitable business which hampers the ingenuousness of the real fact. Several techniques have been introduced regarding this problem which depend mostly upon empirical conditions, rating consistency, obvious content features, and helpfulness voting etc. which confines the effectiveness of this undertaking. Most of the existing researches are supervised models whereas, good quality large-scale datasets are still very scarce and most of the models use pseudo fake reviews instead of real fake reviews. In this research, we introduce active learning approach to detect review spam using the TF-IDF features of the review content. Our model achieves phenomenal improvements in performance measures, working on almost 3600 reviews from different domains. In the best case, it achieves up to 88% accuracy and precision, recall and f-scores are above 85% in most cases. Additionally, about 2000 reviews were manually labeled during the process. Finally, after evaluating results, it indicates that this is a promising methodology for detecting review spams.

[1]  Arjun Mukherjee,et al.  Fake Review Detection: Classification and Analysis of Real and Pseudo Reviews , 2013 .

[2]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[3]  Philip S. Yu,et al.  Review spam detection via temporal pattern discovery , 2012, KDD.

[4]  Yejin Choi,et al.  Distributional Footprints of Deceptive Product Reviews , 2012, ICWSM.

[5]  Chengai Sun,et al.  Exploiting Product Related Review Features for Fake Review Detection , 2016 .

[6]  Bing Liu,et al.  Spotting Fake Reviews via Collective Positive-Unlabeled Learning , 2014, 2014 IEEE International Conference on Data Mining.

[7]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[8]  Naomie Salim,et al.  Detection of fake opinions using time series , 2016, Expert Syst. Appl..

[9]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[10]  Taghi M. Khoshgoftaar,et al.  Survey of review spam detection using machine learning techniques , 2015, Journal of Big Data.

[11]  Masashi Sugiyama,et al.  Active Learning in Recommender Systems , 2011, Recommender Systems Handbook.

[12]  Claire Cardie,et al.  Negative Deceptive Opinion Spam , 2013, NAACL.

[13]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[14]  Yi Yang,et al.  Learning to Identify Review Spam , 2011, IJCAI.

[15]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[16]  Arjun Mukherjee,et al.  What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[17]  J. Pennebaker,et al.  Lying Words: Predicting Deception from Linguistic Styles , 2003, Personality & social psychology bulletin.

[18]  Arjun Mukherjee,et al.  Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns , 2015, ICWSM.

[19]  Son Lam Phung,et al.  Learning Pattern Classification Tasks with Imbalanced Data Sets , 2009 .

[20]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[21]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[22]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[23]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[24]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[25]  Padmini Srinivasan,et al.  Detecting Wikipedia vandalism with active learning and statistical language models , 2010, WICOW '10.

[26]  Harry Wechsler,et al.  Spam Detection using Clustering, Random Forests, and Active Learning , 2009 .

[27]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.