Limits of Detecting Text Generated by Large-Scale Language Models

Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is ex-tended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.

[1]  Sakshi Agarwal,et al.  Limits of Deepfake Detection: A Robust Estimation Viewpoint , 2019, ArXiv.

[2]  Ueli Maurer,et al.  Authentication theory and hypothesis testing , 2000, IEEE Trans. Inf. Theory.

[3]  Miguel A. Luengo-Oroz,et al.  Automated Speech Generation from UN General Assembly Statements: Mapping Risks in AI Generated Texts , 2019, ArXiv.

[4]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[5]  Percy Liang,et al.  Unifying Human and Statistical Evaluation for Natural Language Generation , 2019, NAACL.

[6]  Thomas M. Cover,et al.  A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[7]  Regina Barzilay,et al.  Are We Safe Yet? The Limitations of Distributional Features for Fake News Detection , 2019, ArXiv.

[8]  H. Vincent Poor,et al.  Neyman-pearson detection of gauss-Markov signals in noise: closed-form error exponentand properties , 2005, IEEE Transactions on Information Theory.

[9]  A. Rukhin,et al.  Adaptive tests for stochastic processes in the ergodic case , 1993 .

[10]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[11]  Olivier Binette,et al.  A Note on Reverse Pinsker Inequalities , 2018, IEEE Transactions on Information Theory.

[12]  Sergio Verdú,et al.  $f$ -Divergence Inequalities , 2015, IEEE Transactions on Information Theory.

[13]  Edgar N. Gilbert,et al.  Codes based on inaccurate source probabilities , 1971, IEEE Trans. Inf. Theory.

[14]  Imre Csiszár,et al.  On Rate of Convergence of Statistical Estimation of Stationary Ergodic Processes , 2010, IEEE Transactions on Information Theory.

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Marc'Aurelio Ranzato,et al.  Real or Fake? Learning to Discriminate Machine from Human Generated Text , 2019, ArXiv.

[17]  K. Marton Measure concentration for a class of random processes , 1998 .

[18]  Lav R. Varshney,et al.  Pretrained AI Models: Performativity, Mobility, and Change , 2019, ArXiv.

[19]  Nicolaos S. Tzannes,et al.  On Estimating the Entropy of Random Fields , 1970, Inf. Control..

[20]  Lav R. Varshney,et al.  CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[21]  K. Marton Bounding $\bar{d}$-distance by informational divergence: a method to prove measure concentration , 1996 .

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Alex Wang,et al.  BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model , 2019, Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation.

[24]  Alexander M. Rush,et al.  GLTR: Statistical Detection and Visualization of Generated Text , 2019, ACL.

[25]  Samy Bengio,et al.  N-gram Language Modeling using Recurrent Neural Network Estimation , 2017, ArXiv.

[26]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[27]  F. Pellegrino,et al.  Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche , 2019, Science Advances.

[28]  Alec Radford,et al.  Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.

[29]  H. Vincent Poor,et al.  Neyman-Pearson Detection of Gauss-Markov Signals in Noise: Closed-Form Error Exponent and Properties , 2005, ISIT.

[30]  A. Chapanis The reconstruction of abbreviated printed messages. , 1954, Journal of experimental psychology.

[31]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[32]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[33]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.