GANs for Semi-Supervised Opinion Spam Detection

Online reviews have become a vital source of information in purchasing a service (product). Opinion spammers manipulate reviews, affecting the overall perception of the service. A key challenge in detecting opinion spam is obtaining ground truth. Though there exists a large set of reviews online, only a few of them have been labeled spam or non-spam. In this paper, we propose spamGAN, a generative adversarial network which relies on limited set of labeled data as well as unlabeled data for opinion spam detection. spamGAN improves the state-of-the-art GAN based techniques for text classification. Experiments on TripAdvisor dataset show that spamGAN outperforms existing spam detection techniques when limited labeled data is used. Apart from detecting spam reviews, spamGAN can also generate reviews with reasonable perplexity.

[1]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[2]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[3]  Yi Yang,et al.  Learning to Identify Review Spam , 2011, IJCAI.

[4]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[5]  Paolo Rosso,et al.  Using PU-Learning to Detect Deceptive Opinion Spam , 2013, WASSA@NAACL-HLT.

[6]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[7]  Leman Akoglu,et al.  Collective Opinion Spam Detection: Bridging Review Networks and Metadata , 2015, KDD.

[8]  Wai Lam,et al.  A Unified Model for Unsupervised Opinion Spamming Detection Incorporating Text Generality , 2015, IJCAI.

[9]  Hung-Yi Lee,et al.  Improving Conditional Sequence Generative Adversarial Networks by Stepwise Evaluation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[11]  Quan Pan,et al.  A Generative Model for category text generation , 2018, Inf. Sci..

[12]  Dong-Hong Ji,et al.  Neural networks for deceptive opinion spam detection: An empirical study , 2017, Inf. Sci..

[13]  Taghi M. Khoshgoftaar,et al.  Survey of review spam detection using machine learning techniques , 2015, Journal of Big Data.

[14]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[15]  Ee-Peng Lim,et al.  Finding unusual review patterns using unexpected rules , 2010, CIKM.

[16]  Philip S. Yu,et al.  Review Graph Based Online Store Review Spammer Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[17]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[18]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[19]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[20]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[21]  Stefano Ermon,et al.  Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models , 2017, AAAI.

[22]  Larry A. Wasserman,et al.  Statistical Analysis of Semi-Supervised Regression , 2007, NIPS.

[23]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[24]  Wen Zhang,et al.  DRI-RCNN: An approach to deceptive review identification using recurrent convolutional neural network , 2018, Inf. Process. Manag..

[25]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[26]  Volume 22 , 1998 .

[27]  Alberto Flores Rueda,et al.  Computación Y Sistemas , 2022 .

[28]  Abhishek Kumar,et al.  Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference , 2017, NIPS.

[29]  Arjun Mukherjee,et al.  Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns , 2015, ICWSM.

[30]  Ferenc Huszar,et al.  How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.

[31]  Yong Yu,et al.  Long Text Generation via Adversarial Training with Leaked Information , 2017, AAAI.

[32]  Andrew M. Dai,et al.  MaskGAN: Better Text Generation via Filling in the ______ , 2018, ICLR.

[33]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[34]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.