Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection

Advanced neural language models (NLMs) are widely used in sequence generation tasks because they are able to produce fluent and meaningful sentences. They can also be used to generate fake reviews, which can then be used to attack online review systems and influence the buying decisions of online shoppers. To perform such attacks, it is necessary for experts to train a tailored LM for a specific topic. In this work, we show that a low-skilled threat model can be built just by combining publicly available LMs and show that the produced fake reviews can fool both humans and machines. In particular, we use the GPT-2 NLM to generate a large number of high-quality reviews based on a review with the desired sentiment and then using a BERT based text classifier (with accuracy of 96%) to filter out reviews with undesired sentiments. Because none of the words in the review are modified, fluent samples like the training data can be generated from the learned distribution. A subjective evaluation with 80 participants demonstrated that this simple method can produce reviews that are as fluent as those written by people. It also showed that the participants tended to distinguish fake reviews randomly. Three countermeasures, Grover, GLTR, and OpenAI GPT-2 detector, were found to be difficult to accurately detect fake review.

[1]  Alec Radford,et al.  Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.

[2]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[3]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[4]  Kyumin Lee,et al.  The Dark Side of Micro-Task Marketplaces: Characterizing Fiverr and Automatically Detecting Crowdturfing , 2014, ICWSM.

[5]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[6]  Alexander M. Rush,et al.  GLTR: Statistical Detection and Visualization of Generated Text , 2019, ACL.

[7]  Xuanjing Huang,et al.  Toward Diverse Text Generation with Inverse Reinforcement Learning , 2018, IJCAI.

[8]  Xuanjing Huang,et al.  Towards Diverse Text Generation with Inverse Reinforcement Learning , 2018, ArXiv.

[9]  Steve Renals,et al.  Multiplicative LSTM for sequence modelling , 2016, ICLR.

[10]  Dietrich Klakow,et al.  Estimation of Gap Between Current Language Models and Human Performance , 2017, INTERSPEECH.

[11]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[14]  Adam Coates,et al.  Cold Fusion: Training Seq2Seq Models Together with Language Models , 2017, INTERSPEECH.

[15]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[16]  Kyumin Lee,et al.  Characterizing and automatically detecting crowdturfing in Fiverr and Twitter , 2015, Social Network Analysis and Mining.

[17]  Lyan Verwimp,et al.  Character-Word LSTM Language Models , 2017, EACL.

[18]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[19]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21]  Stefan Savage,et al.  Dirty Jobs: The Role of Freelance Labor in Web Service Abuse , 2011, USENIX Security Symposium.

[22]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[24]  Veselin Stoyanov,et al.  Simple Fusion: Return of the Language Model , 2018, WMT.

[25]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[26]  Michael Luca,et al.  Fake It Till You Make It: Reputation, Competition, and Yelp Review Fraud , 2015 .

[27]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[28]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[29]  Bryan Catanzaro,et al.  Large Scale Language Modeling: Converging on 40GB of Text in Four Hours , 2018, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[30]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[31]  Ben Y. Zhao,et al.  Automated Crowdturfing Attacks and Defenses in Online Review Systems , 2017, CCS.

[32]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Xirong Li,et al.  Deep Text Classification Can be Fooled , 2017, IJCAI.

[35]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Yong Yu,et al.  Neural Text Generation: Past, Present and Beyond , 2018, 1803.07133.

[37]  Bo Sun,et al.  Stay On-Topic: Generating Context-specific Fake Restaurant Reviews , 2018, ESORICS.

[38]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.