Automatic Detection of Machine Generated Text: A Critical Survey

Text generative models (TGMs) excel in producing text that matches the style of human language reasonably well. Such TGMs can be misused by adversaries, e.g., by automatically generating fake news and fake product reviews that can look authentic and fool humans. Detectors that can distinguish text generated by TGM from human written text play a vital role in mitigating such misuse of TGMs. Recently, there has been a flurry of works from both natural language processing (NLP) and machine learning (ML) communities to build accurate detectors for English. Despite the importance of this problem, there is currently no work that surveys this fast-growing literature and introduces newcomers to important research challenges. In this work, we fill this void by providing a critical survey and review of this literature to facilitate a comprehensive understanding of this problem. We conduct an in-depth error analysis of the state-of-the-art detector and discuss research directions to guide future work in this exciting area.

[1]  Fabio Petroni,et al.  How Decoding Strategies Affect the Verifiability of Generated Text , 2019, FINDINGS.

[2]  Alec Radford,et al.  Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.

[3]  Andreas Vlachos,et al.  Automated Fact Checking: Task Formulations, Methods and Future Directions , 2018, COLING.

[4]  Gökhan BakIr,et al.  Energy-Based Models , 2007 .

[5]  Maurizio Tesconi,et al.  TweepFake: about Detecting Deepfake Tweets , 2020, ArXiv.

[6]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[7]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[8]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[9]  Sinan Aral,et al.  The spread of true and false news online , 2018, Science.

[10]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[11]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[12]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[13]  Peter Szolovits,et al.  Clinically Accurate Chest X-Ray Report Generation , 2019, MLHC.

[14]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Junichi Yamagishi,et al.  Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection , 2019, AINA.

[17]  Dirk Hovy,et al.  The Enemy in Your Own Camp: How Well Can We Detect Statistically-Generated Fake Reviews – An Adversarial Study , 2016, ACL.

[18]  Max Wolff,et al.  Attacking Neural Text Detectors , 2020, ArXiv.

[19]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[20]  Jianfeng Gao,et al.  DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[21]  Alexander M. Rush,et al.  GLTR: Statistical Detection and Visualization of Generated Text , 2019, ACL.

[22]  Myle Ott,et al.  Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.

[23]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[24]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[25]  Chris Callison-Burch,et al.  Human and Automatic Detection of Generated Text , 2019, ArXiv.

[26]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[27]  Regina Barzilay,et al.  The Limitations of Stylometry for Detecting Machine-Generated Fake News , 2019, Computational Linguistics.

[28]  Jason Yosinski,et al.  Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2020, ICLR.

[29]  M. Weiss Deepfake Bot Submissions to Federal Public Comment Websites Cannot Be Distinguished from Human Submissions , 2019 .

[30]  Lav R. Varshney,et al.  CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[31]  B. Earl Real or fake? , 2002, BDJ.

[32]  Chris Callison-Burch,et al.  RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text , 2020, EMNLP.

[33]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[36]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[37]  Dongwon Lee,et al.  Authorship Attribution for Neural Text Generation , 2020, EMNLP.

[38]  M. Kay Language Models , 2006 .

[39]  Ming Zhou,et al.  Neural Deepfake Detection with Factual Structure of Text , 2020, EMNLP.

[40]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[41]  Mai ElSherief,et al.  Mitigating Gender Bias in Natural Language Processing: Literature Review , 2019, ACL.

[42]  Siva Reddy,et al.  StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.

[43]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[44]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[45]  Andrew Tomkins,et al.  Reverse Engineering Configurations of Neural Text Generation Models , 2020, ACL.

[46]  Marc'Aurelio Ranzato,et al.  Real or Fake? Learning to Discriminate Machine from Human Generated Text , 2019, ArXiv.

[47]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[48]  Kate Saenko,et al.  Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News , 2020, EMNLP.