论文信息 - Out-of-Distribution Generalization in Text Classification: Past, Present, and Future - 字舞流文

Out-of-Distribution Generalization in Text Classification: Past, Present, and Future

Machine learning (ML) systems in natural language processing (NLP) face significant challenges in generalizing to out-of-distribution (OOD) data, where the test distribution differs from the training data distribution. This poses important questions about the robustness of NLP models and their high accuracy, which may be artificially inflated due to their underlying sensitivity to systematic biases. Despite these challenges, there is a lack of comprehensive surveys on the generalization challenge from an OOD perspective in text classification. Therefore, this paper aims to fill this gap by presenting the first comprehensive review of recent progress, methods, and evaluations on this topic. We furth discuss the challenges involved and potential future research directions. By providing quick access to existing work, we hope this survey will encourage future research in this area.

Lingqiao Liu | Jindong Wang | Jennifer Foster | Linyi Yang | Yue Zhang | Yidong Wang | Chenyang Lyu | Yue Zhang | Linyi Yang | Yaoxiao Song | Xuan Ren | Chenyang Lyu | Lingqiao Liu | Jindong Wang | Y. Song | Xuan Ren

[1] Ruihai Dong,et al. Learning to Generalize for Cross-domain QA , 2023, ACL.

[2] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[3] E.C. Garrido-Merchán,et al. ChatGPT is not all you need. A State of the Art Review of large Generative AI models , 2023, ArXiv.

[4] Matt Gardner,et al. QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension , 2021, ACM Comput. Surv..

[5] Christian Reuter,et al. A Survey on Data Augmentation for Text Classification , 2021, ACM Comput. Surv..

[6] Colin Raffel,et al. An Empirical Survey of Data Augmentation for Limited Data Learning in NLP , 2021, TACL.

[7] Jindong Wang,et al. GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective , 2022, ACL.

[8] Yejin Choi,et al. NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer Data Augmentation , 2022, EMNLP.

[9] Arabella J. Sinclair,et al. A taxonomy and review of generalization research in NLP , 2022, Nature Machine Intelligence.

[10] Xuanjing Huang,et al. MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective , 2022, ACL.

[11] Jennifer Foster,et al. Extending the Scope of Out-of-Domain: Examining QA models in multiple subdomains , 2022, INSIGHTS.

[12] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[13] Brian Mac Namee,et al. A Rationale-Centric Framework for Human-in-the-loop Machine Learning , 2022, ACL.

[14] Matt Gardner,et al. Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets , 2022, ACL.

[15] Le Sun,et al. Few-shot Named Entity Recognition with Self-describing Networks , 2022, ACL.

[16] I. Rish,et al. WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series Tasks , 2022, Trans. Mach. Learn. Res..

[17] Swaroop Mishra,et al. Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness , 2022, FINDINGS.

[18] Percy Liang,et al. Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution , 2022, ICLR.

[19] Robin Jia,et al. Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants , 2021, NAACL.

[20] Xuezhi Wang,et al. Measure and Improve Robustness in NLP Models: A Survey , 2021, NAACL.

[21] Katja Filippova,et al. “Will You Find These Shortcuts?” A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification , 2021, EMNLP.

[22] Xiang Ren,et al. Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER , 2021, ACL.

[23] Emre Kıcıman,et al. Invariant Language Modeling , 2021, EMNLP.

[24] Xuezhi Wang,et al. Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models , 2021, NAACL-HLT.

[25] Mirella Lapata,et al. Disentangled Sequence to Sequence Learning for Compositional Generalization , 2021, ACL.

[26] Wanxiang Che,et al. Data Augmentation Approaches in Natural Language Processing: A Survey , 2021, AI Open.

[27] Xuanjing Huang,et al. Template-free Prompt Tuning for Few-shot NER , 2021, NAACL.

[28] Sarkar Snigdha Sarathi Das,et al. CONTaiNER: Few-Shot Named Entity Recognition via Contrastive Learning , 2021, ACL.

[29] S. Riedel,et al. Challenges in Generalization in Open Domain Question Answering , 2021, NAACL-HLT.

[30] Elia Bruni,et al. The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study , 2021, ACL.

[31] Byron C. Wallace,et al. Combining Feature and Instance Attribution to Detect Artifacts , 2021, FINDINGS.

[32] Chuanqi Tan,et al. KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction , 2021, WWW.

[33] Swaroop Mishra,et al. Towards Improving Selective Prediction Ability of NLP Systems , 2020, REPL4NLP.

[34] Di Wu,et al. Challenges to Open-Domain Constituency Parsing , 2022, FINDINGS.

[35] Matt Gardner,et al. Structurally Diverse Sampling Reduces Spurious Correlations in Semantic Parsing Datasets , 2022, ArXiv.

[36] Mohammad Bavarian,et al. Training Verifiers to Solve Math Word Problems , 2021, ArXiv.

[37] Frank Rudzicz,et al. Quantifying the Task-Specific Information in Text-Based Classifications , 2021, arXiv.org.

[38] Zhilin Yang,et al. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks , 2021, ArXiv.

[39] Milan Straka,et al. Understanding Model Robustness to User-generated Noisy Texts , 2021, WNUT.

[40] Dan Friedman,et al. Single-dataset Experts for Multi-dataset Question Answering , 2021, EMNLP.

[41] Udit Arora,et al. Types of Out-of-Distribution Texts and How to Detect Them , 2021, EMNLP.

[42] Iryna Gurevych,et al. Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning , 2021, EMNLP.

[43] Marjan Ghazvininejad,et al. Distributionally Robust Multilingual Machine Translation , 2021, EMNLP.

[44] Yoon Kim,et al. Sequence-to-Sequence Learning with Latent Neural Grammars , 2021, NeurIPS.

[45] Peng Cui,et al. Towards Out-Of-Distribution Generalization: A Survey , 2021, ArXiv.

[46] Milad Moradi,et al. Evaluating the Robustness of Neural Language Models to Input Perturbations , 2021, EMNLP.

[47] Milad Moradi,et al. Deep learning models are not robust against noise in clinical text , 2021, ArXiv.

[48] Sinno Jialin Pan,et al. AdaRNN: Adaptive Learning and Forecasting of Time Series , 2021, CIKM.

[49] Chelsea Finn,et al. Just Train Twice: Improving Group Robustness without Training Group Information , 2021, ICML.

[50] Soujanya Poria,et al. DOZEN: Cross-Domain Zero Shot Named Entity Recognition with Knowledge Graph , 2021, SIGIR.

[51] Ruihai Dong,et al. Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis , 2021, ACL.

[52] Pradeep Ravikumar,et al. Improving Compositional Generalization in Classification Tasks via Structure Annotations , 2021, ACL.

[53] Stanislav Fort,et al. Exploring the Limits of Out-of-Distribution Detection , 2021, NeurIPS.

[54] Leyang Cui,et al. Template-Based Named Entity Recognition Using BART , 2021, FINDINGS.

[55] Didier Schwab,et al. Lightweight Adapter Tuning for Multilingual Speech Translation , 2021, ACL.

[56] Dongyan Zhao,et al. Why Machine Reading Comprehension Models Learn Shortcuts? , 2021, FINDINGS.

[57] Alexander D'Amour,et al. Counterfactual Invariance to Spurious Correlations: Why and How to Pass Stress Tests , 2021, ArXiv.

[58] Yongjing Yin,et al. On Compositional Generalization of Neural Machine Translation , 2021, ACL.

[59] Eduard Hovy,et al. A Survey of Data Augmentation Approaches for NLP , 2021, FINDINGS.

[60] Muhao Chen,et al. Contrastive Out-of-Distribution Detection for Pretrained Transformers , 2021, EMNLP.

[61] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[62] S. Riedel,et al. Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation , 2021, EMNLP.

[63] Noah A. Smith,et al. Competency Problems: On Finding and Removing Artifacts in Language Data , 2021, EMNLP.

[64] Georgiana Dinu,et al. Improving Gender Translation Accuracy with Filtered Self-Training , 2021, ArXiv.

[65] Guanghui Qin,et al. Learning How to Ask: Querying LMs with Mixtures of Soft Prompts , 2021, NAACL.

[66] Nan Duan,et al. AR-LSAT: Investigating Analytical Reasoning of Text , 2021, ArXiv.

[67] Dan Klein,et al. Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections , 2021, EMNLP.

[68] Liwei Wang,et al. DAGN: Discourse-Aware Graph Network for Logical Reasoning , 2021, NAACL.

[69] Xipeng Qiu,et al. TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing , 2021, ACL.

[70] Graham Neubig,et al. Modeling the Second Player in Distributionally Robust Optimization , 2021, ICLR.

[71] Franck Dernoncourt,et al. Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models , 2021, NAACL.

[72] Marcus Tomalin,et al. The practical ethics of bias reduction in machine translation: why domain adaptation is better than data debiasing , 2021, Ethics and Information Technology.

[73] Dawn Song,et al. Measuring Mathematical Problem Solving With the MATH Dataset , 2021, NeurIPS Datasets and Benchmarks.

[74] Yoshua Bengio,et al. Towards Causal Representation Learning , 2021, ArXiv.

[75] Phil Blunsom,et al. Mind the Gap: Assessing Temporal Generalization in Neural Language Models , 2021, NeurIPS.

[76] Jeffrey Heer,et al. Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models , 2021, ACL.

[77] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.

[78] J. Niehues,et al. Improving Zero-Shot Translation by Disentangling Positional Information , 2020, ACL.

[79] Zhao Wang,et al. Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals , 2020, AAAI.

[80] Pang Wei Koh,et al. WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[81] Brian M. Sadler,et al. Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases , 2020, WWW.

[82] Heike Adel,et al. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios , 2020, NAACL.

[83] Zachary Chase Lipton,et al. Explaining The Efficacy of Counterfactually-Augmented Data , 2020, ICLR.

[84] Sebastian Riedel,et al. Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets , 2020, EACL.

[85] Jonathan Berant,et al. Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering , 2020, Transactions of the Association for Computational Linguistics.

[86] Lei Li,et al. LightNER: A Lightweight Generative Framework with Prompt-guided Attention for Low-resource NER , 2021 .

[87] Barbara Plank,et al. Cross-Lingual Cross-Domain Nested Named Entity Evaluation on English Web Texts , 2021, FINDINGS.

[88] Xiang Lisa Li,et al. Ensembles and Cocktails: Robust Finetuning for Natural Language Generation , 2021 .

[89] G. Strang,et al. A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More , 2021, ArXiv.

[90] He He,et al. IRM - when it works and when it doesn't: A test case of natural language inference , 2021, NeurIPS.

[91] Jun Zhao,et al. NSRL: Named Entity Recognition with Noisy Labels via Selective Review Learning , 2021, CCKS.

[92] Chunyan Miao,et al. MELM: Data Augmentation with Masked Entity Language Modeling for Cross-lingual NER , 2021, ArXiv.

[93] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[94] Tal Linzen,et al. COGS: A Compositional Generalization Challenge Based on Semantic Interpretation , 2020, EMNLP.

[95] Xiang Yue,et al. PHICON: Improving Generalization of Clinical Text De-identification Models via Data Augmentation , 2020, CLINICALNLP.

[96] Zhao Wang,et al. Identifying spurious correlations for robust text classification , 2020, FINDINGS.

[97] Gholamreza Haffari,et al. Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models , 2020, EMNLP.

[98] Ion Androutsopoulos,et al. Domain Adversarial Fine-Tuning as an Effective Regularizer , 2020, FINDINGS.

[99] Xuanjing Huang,et al. Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis , 2020, EMNLP.

[100] Chen Liang,et al. Compositional Generalization via Neural-Symbolic Stack Machines , 2020, NeurIPS.

[101] Marc van Zee,et al. Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures , 2020, ArXiv.

[102] Lifu Tu,et al. An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models , 2020, Transactions of the Association for Computational Linguistics.

[103] Percy Liang,et al. Robustness to Spurious Correlations via Human Annotations , 2020, ICML.

[104] Hanmeng Liu,et al. LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning , 2020, IJCAI.

[105] Jingyu Wang,et al. Adversarial and Domain-Aware BERT for Cross-Domain Sentiment Analysis , 2020, ACL.

[106] Yue Zhang,et al. Multi-Cell Compositional LSTM for NER Domain Adaptation , 2020, ACL.

[107] Barbara Plank,et al. Neural Unsupervised Domain Adaptation in NLP—A Survey , 2020, COLING.

[108] Goran Nenadic,et al. Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models , 2020, ArXiv.

[109] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[110] Sameer Singh,et al. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[111] Emily Denton,et al. Social Biases in NLP Models as Barriers for Persons with Disabilities , 2020, ACL.

[112] Katherine A. Keith,et al. Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates , 2020, ACL.

[113] Iryna Gurevych,et al. MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.

[114] David Lopez-Paz,et al. Permutation Equivariant Models for Compositional Generalization in Language , 2020, ICLR.

[115] Biqing Huang,et al. Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language , 2020, ACL.

[116] Maosong Sun,et al. How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence , 2020, ACL.

[117] Rico Sennrich,et al. Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , 2020, ACL.

[118] M. Bethge,et al. Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[119] Dawn Song,et al. Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[120] Egoitz Laparra,et al. Rethinking domain adaptation for machine learning over clinical language , 2020, JAMIA open.

[121] Yo Joong Choe,et al. An Empirical Study of Invariant Risk Minimization , 2020, ArXiv.

[122] Noah A. Smith,et al. Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.

[123] Orhan Firat,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[124] Xin Zheng,et al. HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference , 2020, LREC.

[125] Eunah Cho,et al. Data Augmentation using Pre-trained Transformer Models , 2020, LIFELONGNLP.

[126] Kush R. Varshney,et al. Invariant Risk Minimization Games , 2020, ICML.

[127] Jiashi Feng,et al. ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning , 2020, ICLR.

[128] Ronan Le Bras,et al. Adversarial Filters of Dataset Biases , 2020, ICML.

[129] Yuliang Li,et al. Snippext: Semi-supervised Opinion Mining with Augmented Data , 2020, WWW.

[130] Sebastian Riedel,et al. Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension , 2020, Transactions of the Association for Computational Linguistics.

[131] Mohit Bansal,et al. Multi-Source Domain Adaptation for Text Classification via DistanceNet-Bandits , 2020, AAAI.

[132] Xiao Wang,et al. Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.

[133] Min Zhang,et al. Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation , 2019, AAAI.

[134] Samuel R. Bowman,et al. BLiMP: The Benchmark of Linguistic Minimal Pairs for English , 2019, Transactions of the Association for Computational Linguistics.

[135] Kentaro Inui,et al. Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets , 2019, AAAI.

[136] Jianfeng Gao,et al. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2019, ACL.

[137] J. Weston,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.

[138] Rahul Desai,et al. Healthcare NER Models Using Language Model Pretraining , 2019, HSDM@WSDM.

[139] Holger Schwenk,et al. MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[140] Zachary Chase Lipton,et al. Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2019, ICLR.

[141] T. Goldstein,et al. FreeLB: Enhanced Adversarial Training for Natural Language Understanding , 2019, ICLR.

[142] Minlie Huang,et al. Out-of-Domain Detection for Natural Language Understanding in Dialog Systems , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[143] Ronan Le Bras,et al. WinoGrande , 2019, AAAI.

[144] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[145] Jacob Andreas,et al. Good-Enough Compositional Data Augmentation , 2019, ACL.

[146] Anupam Datta,et al. Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.

[147] Przemyslaw Biecek,et al. Models in the Wild: On Corruption Robustness of Neural NLP Systems , 2019, ICONIP.

[148] Tatsunori B. Hashimoto,et al. Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[149] Jianmo Ni,et al. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[150] Kentaro Inui,et al. When Choosing Plausible Alternatives, Clever Hans can be Clever , 2019, EMNLP.

[151] Qiang Yang,et al. Transferable End-to-End Aspect-based Sentiment Analysis with Selective Adversarial Learning , 2019, EMNLP.

[152] Donggyu Kim,et al. Domain-agnostic Question-Answering with Adversarial Training , 2019, EMNLP.

[153] Liang Zhao,et al. Compositional Generalization for Primitive Substitutions , 2019, EMNLP.

[154] Ryan Cotterell,et al. Don’t Forget the Long Tail! A Comprehensive Analysis of Morphological Generalization in Bilingual Lexicon Induction , 2019, EMNLP.

[155] Percy Liang,et al. Distributionally Robust Language Modeling , 2019, EMNLP.

[156] Ryan Cotterell,et al. It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution , 2019, EMNLP.

[157] Y. Lu,et al. Adversarial Learning with Contextual Embeddings for Zero-resource Cross-lingual Classification and NER , 2019, EMNLP.

[158] Xiaodong Liu,et al. Adversarial Domain Adaptation for Machine Reading Comprehension , 2019, EMNLP.

[159] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[160] Dan Klein,et al. Cross-Domain Generalization of Neural Constituency Parsers , 2019, ACL.

[161] David Lopez-Paz,et al. Invariant Risk Minimization , 2019, ArXiv.

[162] Liang Xiao,et al. Cross-Domain NER using Cross-Domain Language Modeling , 2019, ACL.

[163] Brenden M. Lake,et al. Compositional generalization through meta sequence-to-sequence learning , 2019, NeurIPS.

[164] Ryan Cotterell,et al. Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology , 2019, ACL.

[165] Munmun De Choudhury,et al. A Social Media Study on the Effects of Psychiatric Medication Use , 2019, ICWSM.

[166] Sameer Singh,et al. Compositional Questions Do Not Necessitate Multi-hop Reasoning , 2019, ACL.

[167] Yang Feng,et al. Bridging the Gap between Training and Inference for Neural Machine Translation , 2019, ACL.

[168] Yong Cheng,et al. Robust Neural Machine Translation with Doubly Adversarial Inputs , 2019, ACL.

[169] Noah A. Smith,et al. Evaluating Gender Bias in Machine Translation , 2019, ACL.

[170] Tao Yu,et al. SParC: Cross-Domain Semantic Parsing in Context , 2019, ACL.

[171] Shi Feng,et al. Misleading Failures of Partial-input Baselines , 2019, ACL.

[172] Yoshua Bengio,et al. Compositional generalization in a deep seq2seq model by separating syntax and semantics , 2019, ArXiv.

[173] Jacob Eisenstein,et al. Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling , 2019, EMNLP.

[174] Ryan Cotterell,et al. Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[175] Chandler May,et al. On Measuring Social Biases in Sentence Encoders , 2019, NAACL.

[176] Ankur Bapna,et al. The Missing Ingredient in Zero-Shot Neural Machine Translation , 2019, ArXiv.

[177] Yoav Goldberg,et al. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[178] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[179] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[180] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[181] Kai Zou,et al. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[182] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[183] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[184] Andy Way,et al. Getting Gender Right in Neural Machine Translation , 2019, EMNLP.

[185] Lucy Vasserman,et al. Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.

[186] Kentaro Inui,et al. What Makes Reading Comprehension Questions Easier? , 2018, EMNLP.

[187] Pascale Fung,et al. Reducing Gender Bias in Abusive Language Detection , 2018, EMNLP.

[188] Yejin Choi,et al. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[189] Zeyu Li,et al. Learning Gender-Neutral Word Embeddings , 2018, EMNLP.

[190] Zachary C. Lipton,et al. How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks , 2018, EMNLP.

[191] Carolyn Penstein Rosé,et al. Stress Test Evaluation for Natural Language Inference , 2018, COLING.

[192] Yue Zhang,et al. Learning Domain Representation for Multi-Domain Sentiment Classification , 2018, NAACL.

[193] Daniel Jurafsky,et al. Deconfounded Lexicon Induction for Interpretable Social Science , 2018, NAACL.

[194] Mirella Lapata,et al. Coarse-to-Fine Decoding for Neural Semantic Parsing , 2018, ACL.

[195] Saif Mohammad,et al. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems , 2018, *SEMEVAL.

[196] Rachel Rudinger,et al. Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[197] Huda Khayrallah,et al. On the Impact of Various Types of Noise on Neural Machine Translation , 2018, NMT@ACL.

[198] Rachel Rudinger,et al. Gender Bias in Coreference Resolution , 2018, NAACL.

[199] Anne Marie Piper,et al. Addressing Age-Related Bias in Sentiment Analysis , 2018, CHI.

[200] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[201] Jieyu Zhao,et al. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[202] Barbara Plank,et al. Strong Baselines for Neural Semi-Supervised Learning under Domain Shift , 2018, ACL.

[203] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[204] Claire Cardie,et al. Multinomial Adversarial Networks for Multi-Domain Text Classification , 2018, NAACL.

[205] Yonatan Belinkov,et al. Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[206] Marco Baroni,et al. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[207] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[208] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[209] Enhong Chen,et al. Interactive Attention Transfer Network for Cross-Domain Sentiment Classification , 2018 .

[210] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[211] Abbas Ghaddar,et al. WiNER: A Wikipedia Annotated Corpus for Named Entity Recognition , 2017, IJCNLP.

[212] Lifu Tu,et al. Pay Attention to the Ending:Strong Neural Baselines for the ROC Story Cloze Task , 2017, ACL.

[213] Omer Levy,et al. Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[214] Andrea Vedaldi,et al. Learning multiple visual domains with residual adapters , 2017, NIPS.

[215] Alvin Cheung,et al. Learning a Neural Semantic Parser from User Feedback , 2017, ACL.

[216] Chandler May,et al. Social Bias in Elicited Natural Language Inferences , 2017, EthNLP@EACL.

[217] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[218] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[219] Arvind Narayanan,et al. Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[220] Andrew M. Dai,et al. Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[221] Rohit Kumar,et al. Machine Learning—Basics , 2017 .

[222] Adam Tauman Kalai,et al. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[223] Virgile Landeiro,et al. Robust Text Classification in the Presence of Confounding Bias , 2016, AAAI.

[224] Mirella Lapata,et al. Language to Logical Form with Neural Attention , 2016, ACL.

[225] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[226] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[227] D. Rubin,et al. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[228] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[229] Qiang Yang,et al. Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[230] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.

[231] Xiaojin Zhu,et al. Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[232] Neil D. Lawrence,et al. Dataset Shift in Machine Learning , 2009 .

[233] Hal Daumé,et al. Frustratingly Easy Domain Adaptation , 2007, ACL.

[234] John Blitzer,et al. Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[235] Tong Zhang,et al. A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[236] Jun'ichi Tsujii,et al. Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[237] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[238] P. Holland. Statistics and Causal Inference , 1985 .

[239] S. S. Ghidary,et al. Xeggora: Exploiting Immune-to-Evidence Symmetries with Full Aggregation in Statistical Relational Models (Extended Abstract) , 2020, IJCAI.