DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

The increasing fluency and widespread usage of large language models (LLMs) highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g., T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See https://ericmitchell.ai/detectgpt for code, data, and other project information.

[1]  Jonathan Katz,et al.  A Watermark for Large Language Models , 2023, ICML.

[2]  Abdelrahman Mahmoud Saber,et al.  DeepFake Video Detection , 2022, 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC).

[3]  Xi Victoria Lin,et al.  OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.

[4]  Stella Rose Biderman,et al.  GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.

[5]  Owain Evans,et al.  TruthfulQA: Measuring How Models Mimic Human Falsehoods , 2021, ACL.

[6]  Stella Biderman,et al.  GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , 2021 .

[7]  Nenghai Yu,et al.  Multi-attentional Deepfake Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Muhammad Abdul-Mageed,et al.  Automatic Detection of Machine Generated Text: A Critical Survey , 2020, COLING.

[9]  Dongwon Lee,et al.  Authorship Attribution for Neural Text Generation , 2020, EMNLP.

[10]  Yu-Gang Jiang,et al.  WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection , 2020, ACM Multimedia.

[11]  M. Tesconi,et al.  TweepFake: About detecting deepfake tweets , 2020, PloS one.

[12]  Brian Dolhansky,et al.  The DeepFake Detection Challenge Dataset , 2020, ArXiv.

[13]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[14]  Oliver Giudice,et al.  DeepFake Detection by Analyzing Convolutional Traces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Chris Callison-Burch,et al.  Human and Automatic Detection of Generated Text , 2019, ArXiv.

[16]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[17]  William W. Cohen,et al.  PubMedQA: A Dataset for Biomedical Research Question Answering , 2019, EMNLP.

[18]  Alec Radford,et al.  Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.

[19]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[20]  Marc'Aurelio Ranzato,et al.  Real or Fake? Learning to Discriminate Machine from Human Generated Text , 2019, ArXiv.

[21]  Alexander M. Rush,et al.  GLTR: Statistical Detection and Visualization of Generated Text , 2019, ACL.

[22]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[23]  Edward J. Delp,et al.  Deepfake Video Detection Using Recurrent Neural Networks , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[24]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[25]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[26]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[29]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[30]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .