暂无分享,去创建一个
Zhiyi Ma | Christopher Potts | Douwe Kiela | Mohit Bansal | Sebastian Riedel | Pontus Stenetorp | Robin Jia | Atticus Geiger | Grusha Prasad | Yixin Nie | Max Bartolo | Divyansh Kaushik | Bertie Vidgen | Zeerak Waseem | Pratik Ringshia | Tristan Thrush | Amanpreet Singh | Adina Williams | Zhengxuan Wu | Christopher Potts | Douwe Kiela | Robin Jia | Mohit Bansal | Divyansh Kaushik | Sebastian Riedel | Pontus Stenetorp | Zhengxuan Wu | Tristan Thrush | Bertie Vidgen | Atticus Geiger | Max Bartolo | Adina Williams | Pratik Ringshia | Amanpreet Singh | Zeerak Talat | Grusha Prasad | Yixin Nie | Zhiyi Ma
[1] Yejin Choi,et al. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics , 2020, EMNLP.
[2] Carolyn Penstein Rosé,et al. Stress Test Evaluation for Natural Language Inference , 2018, COLING.
[3] Christopher Joseph Pal,et al. Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study , 2019, ACL.
[4] Andreas Vlachos,et al. FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.
[5] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.
[6] Pasquale Minervini,et al. Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge , 2018, CoNLL.
[7] Leon Derczynski,et al. Directions in Abusive Language Training Data: Garbage In, Garbage Out , 2020, ArXiv.
[8] Henry Lieberman,et al. A model of textual affect sensing using real-world knowledge , 2003, IUI '03.
[9] Victor Sanchez,et al. Studies on Natural Logic and Categorial Grammar , 1991 .
[10] Mohit Bansal,et al. ConjNLI: Natural Language Inference over Conjunctive Sentences , 2020, EMNLP.
[11] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.
[12] Victor Kuperman,et al. Crowdsourcing and language studies: the new generation of linguistic data , 2010, Mturk@HLT-NAACL.
[13] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.
[14] Noah A. Smith,et al. Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.
[15] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[16] Ingmar Weber,et al. Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.
[17] Yuchen Zhang,et al. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.
[18] Sebastian Riedel,et al. Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension , 2020, Transactions of the Association for Computational Linguistics.
[19] Ido Dagan,et al. Diversify Your Datasets: Analyzing Generalization via Controlled Variance in Adversarial Datasets , 2019, CoNLL.
[20] Francis Ferraro,et al. The Universal Decompositional Semantics Dataset and Decomp Toolkit , 2019, LREC.
[21] Douwe Kiela,et al. Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection , 2021, Annual Meeting of the Association for Computational Linguistics.
[22] Rachel Rudinger,et al. Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.
[23] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[24] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.
[25] Eduardo Blanco,et al. An Analysis of Natural Language Inference Benchmarks through the Lens of Negation , 2020, EMNLP.
[26] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[27] Peter Henderson,et al. With Little Power Comes Great Responsibility , 2020, EMNLP.
[28] Dejing Dou,et al. HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.
[29] Dejing Dou,et al. On Adversarial Examples for Character-Level Neural Machine Translation , 2018, COLING.
[30] J. Benthem. A brief history of natural logic , 2008 .
[31] Huda Khayrallah,et al. On the Impact of Various Types of Noise on Neural Machine Translation , 2018, NMT@ACL.
[32] Roger Levy,et al. SyntaxGym: An Online Platform for Targeted Evaluation of Language Models , 2020, ACL.
[33] Tal Linzen,et al. How Can We Accelerate Progress Towards Human-like Linguistic Generalization? , 2020, ACL.
[34] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.
[35] Eduard Hovy,et al. Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.
[36] Navneet Kaur,et al. Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).
[37] Dan Jurafsky,et al. Utility Is in the Eye of the User: A Critique of NLP Leaderboard Design , 2020, EMNLP.
[38] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[39] Richard Socher,et al. The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.
[40] Sebastian Riedel,et al. Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets , 2020, EACL.
[41] Alec Radford,et al. Learning to summarize from human feedback , 2020, NeurIPS.
[42] Douwe Kiela,et al. SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.
[43] Roy Schwartz,et al. Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets , 2019, NAACL.
[44] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[45] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[46] Christopher Potts,et al. Stress-Testing Neural Models of Natural Language Inference with Multiply-Quantified Sentences , 2018, ArXiv.
[47] Susan Benesch,et al. Dangerous speech and dangerous ideology: an integrated model for monitoring and prevention , 2016 .
[48] Mohit Bansal,et al. Analyzing Compositionality-Sensitivity of NLI Models , 2018, AAAI.
[49] Rachel Rudinger,et al. Gender Bias in Coreference Resolution , 2018, NAACL.
[50] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[51] Jason Weston,et al. Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.
[52] Rachel Rudinger,et al. Lexicosyntactic Inference in Neural Models , 2018, EMNLP.
[53] Samuel R. Bowman,et al. CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.
[54] Jason Weston,et al. Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.
[55] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[56] Adina Williams,et al. Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition , 2020, ACL.
[57] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[58] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[59] Christopher Potts,et al. Sentiment expression conditioned by affective transitions and social forces , 2014, KDD.
[60] Cecilia Ovesdotter Alm,et al. Emotions from Text: Machine Learning for Text-based Emotion Prediction , 2005, HLT.
[61] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[62] Sameer Singh,et al. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.
[63] Christopher Potts,et al. DynaSent: A Dynamic Benchmark for Sentiment Analysis , 2020, ACL.
[64] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[65] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[66] Christopher D. Manning,et al. Towards Ecologically Valid Research on Language User Interfaces , 2020, ArXiv.
[67] Allyson Ettinger,et al. Assessing Phrasal Representation and Composition in Transformers , 2020, EMNLP.
[68] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[69] Jason Weston,et al. The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents , 2020, ACL.
[70] Jason Weston,et al. Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent , 2017, ICLR.
[71] Dong Nguyen,et al. HateCheck: Functional Tests for Hate Speech Detection Models , 2021, ACL/IJCNLP.
[72] Masatoshi Tsuchiya,et al. Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment , 2018, LREC.
[73] Mohit Bansal,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.
[74] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[75] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[76] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[77] Yonatan Belinkov,et al. Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference , 2019, ACL.
[78] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[79] Qiang Yang,et al. Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.
[80] Tal Linzen,et al. COGS: A Compositional Generalization Challenge Based on Semantic Interpretation , 2020, EMNLP.
[81] Shikha Bordia,et al. Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs , 2019, EMNLP.
[82] Claire Cardie,et al. Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.
[83] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[84] Xinlei Chen,et al. Never-Ending Learning , 2012, ECAI.
[85] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[86] Laura A. Dabbish,et al. Designing games with a purpose , 2008, CACM.
[87] Yonatan Belinkov,et al. Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.
[88] Yoav Goldberg,et al. Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.
[89] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.
[90] Scott A. Hale,et al. Challenges and frontiers in abusive content detection , 2019, Proceedings of the Third Workshop on Abusive Language Online.
[91] Chandler May,et al. On Measuring Social Biases in Sentence Encoders , 2019, NAACL.
[92] Emily M. Bender,et al. Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task , 2017, Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems.
[93] Samuel R. Bowman,et al. BLiMP: A Benchmark of Linguistic Minimal Pairs for English , 2019, SCIL.
[94] Samuel R. Bowman,et al. Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data , 2020, INSIGHTS.
[95] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.
[96] Hao Tan,et al. The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions , 2020, EMNLP.
[97] Viviana Patti,et al. Resources and benchmark corpora for hate speech detection: a systematic review , 2020, Language Resources and Evaluation.
[98] Yuxing Chen,et al. Harnessing the linguistic signal to predict scalar inferences , 2019, ACL.
[99] Mitsuru Ishizuka,et al. Recognition of Affect, Judgment, and Appreciation in Text , 2010, COLING.
[100] Panagiotis G. Ipeirotis,et al. Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns” , 2015, JDIQ.
[101] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[102] Kentaro Inui,et al. Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets , 2019, AAAI.
[103] Allyson Ettinger,et al. What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.