论文信息 - Limits of Detecting Text Generated by Large-Scale Language Models

Limits of Detecting Text Generated by Large-Scale Language Models

Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular language models are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is ex-tended from considering specific language models to considering maximum likelihood language models, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.

[1] Sakshi Agarwal,et al. Limits of Deepfake Detection: A Robust Estimation Viewpoint , 2019, ArXiv.

[2] Ueli Maurer,et al. Authentication theory and hypothesis testing , 2000, IEEE Trans. Inf. Theory.

[3] Miguel A. Luengo-Oroz,et al. Automated Speech Generation from UN General Assembly Statements: Mapping Risks in AI Generated Texts , 2019, ArXiv.

[4] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[5] Percy Liang,et al. Unifying Human and Statistical Evaluation for Natural Language Generation , 2019, NAACL.

[6] Thomas M. Cover,et al. A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[7] Regina Barzilay,et al. Are We Safe Yet? The Limitations of Distributional Features for Fake News Detection , 2019, ArXiv.

[8] H. Vincent Poor,et al. Neyman-pearson detection of gauss-Markov signals in noise: closed-form error exponentand properties , 2005, IEEE Transactions on Information Theory.

[9] A. Rukhin,et al. Adaptive tests for stochastic processes in the ergodic case , 1993 .

[10] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.

[11] Olivier Binette,et al. A Note on Reverse Pinsker Inequalities , 2018, IEEE Transactions on Information Theory.

[12] Sergio Verdú,et al. $f$ -Divergence Inequalities , 2015, IEEE Transactions on Information Theory.

[13] Edgar N. Gilbert,et al. Codes based on inaccurate source probabilities , 1971, IEEE Trans. Inf. Theory.

[14] Imre Csiszár,et al. On Rate of Convergence of Statistical Estimation of Stationary Ergodic Processes , 2010, IEEE Transactions on Information Theory.

[15] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[16] Marc'Aurelio Ranzato,et al. Real or Fake? Learning to Discriminate Machine from Human Generated Text , 2019, ArXiv.

[17] K. Marton. Measure concentration for a class of random processes , 1998 .

[18] Lav R. Varshney,et al. Pretrained AI Models: Performativity, Mobility, and Change , 2019, ArXiv.

[19] Nicolaos S. Tzannes,et al. On Estimating the Entropy of Random Fields , 1970, Inf. Control..

[20] Lav R. Varshney,et al. CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[21] K. Marton. Bounding $\bar{d}$-distance by informational divergence: a method to prove measure concentration , 1996 .

[22] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23] Alex Wang,et al. BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model , 2019, Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation.

[24] Alexander M. Rush,et al. GLTR: Statistical Detection and Visualization of Generated Text , 2019, ACL.

[25] Samy Bengio,et al. N-gram Language Modeling using Recurrent Neural Network Estimation , 2017, ArXiv.

[26] Claude E. Shannon,et al. Prediction and Entropy of Printed English , 1951 .

[27] F. Pellegrino,et al. Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche , 2019, Science Advances.

[28] Alec Radford,et al. Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.

[29] H. Vincent Poor,et al. Neyman-Pearson Detection of Gauss-Markov Signals in Noise: Closed-Form Error Exponent and Properties , 2005, ISIT.

[30] A. Chapanis. The reconstruction of abbreviated printed messages. , 1954, Journal of experimental psychology.

[31] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[32] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.

[33] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.