论文信息 - InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective - 字舞流文

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We aim to address this problem from an information-theoretic perspective, and propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models. InfoBERT contains two mutual-information-based regularizers for model training: (i) an Information Bottleneck regularizer, which suppresses noisy mutual information between the input and the feature representation; and (ii) a Robust Feature regularizer, which increases the mutual information between local robust features and global features. We provide a principled way to theoretically analyze and improve the robustness of representation learning for language models in both standard and adversarial training. Extensive experiments demonstrate that InfoBERT achieves state-of-the-art robust accuracy over several adversarial datasets on Natural Language Inference (NLI) and Question Answering (QA) tasks.

Yu Cheng | Zhe Gan | Jingjing Liu | Bo Li | Boxin Wang | Shuohang Wang | Ruoxi Jia

[1] Yoshua Bengio,et al. Mutual Information Neural Estimation , 2018, ICML.

[2] Mohit Bansal,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.

[3] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[4] Zhiyuan Liu,et al. Word-level Textual Adversarial Attacking as Combinatorial Optimization , 2019, ACL.

[5] Matthias Bethge,et al. Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[6] Cordelia Schmid,et al. What makes for good views for contrastive learning , 2020, NeurIPS.

[7] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[8] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[10] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[11] Ananthram Swami,et al. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[12] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[13] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[14] Lei Yu,et al. A Mutual Information Maximization Perspective of Language Representation Learning , 2019, ICLR.

[15] Chao Zhang,et al. Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach , 2020, ArXiv.

[16] Mikhail Khodak,et al. A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[17] Jianfeng Gao,et al. Adversarial Training for Large Neural Language Models , 2020, ArXiv.

[18] Po-Sen Huang,et al. Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation , 2019, EMNLP/IJCNLP.

[19] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[20] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21] Zhe Gan,et al. CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information , 2020, ICML.

[22] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[23] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[24] Aditi Raghunathan,et al. Certified Robustness to Adversarial Word Substitutions , 2019, EMNLP.

[25] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[26] Atul Prakash,et al. Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Wanxiang Che,et al. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[28] Luke S. Zettlemoyer,et al. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[29] Yu Cheng,et al. FreeLB: Enhanced Adversarial Training for Natural Language Understanding , 2020, ICLR.

[30] Ting Wang,et al. TextBugger: Generating Adversarial Text Against Real-world Applications , 2018, NDSS.

[31] Qiang Liu,et al. SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions , 2020, ACL.

[32] Pushmeet Kohli,et al. Training verified learners with learned verifiers , 2018, ArXiv.

[33] Jianfeng Gao,et al. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2019, ACL.

[34] Dejing Dou,et al. HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[35] Ralph Linsker,et al. Self-organization in a perceptual network , 1988, Computer.

[36] David Barber,et al. The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[37] David E. Evans,et al. Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization , 2020, ICML.

[38] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[40] Jason Baldridge,et al. PAWS: Paraphrase Adversaries from Word Scrambling , 2019, NAACL.

[41] Peter Szolovits,et al. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2020, AAAI.

[42] Yu Cheng,et al. Large-Scale Adversarial Training for Vision-and-Language Representation Learning , 2020, NeurIPS.

[43] Mani B. Srivastava,et al. Generating Natural Language Adversarial Examples , 2018, EMNLP.

[44] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[45] Qian Chen,et al. T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack , 2020, EMNLP.