暂无分享,去创建一个
[1] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .
[2] J. R. Landis,et al. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. , 1977, Biometrics.
[3] László Dezsö,et al. Universal Grammar , 1981, Certainty in Action.
[4] R. Chaffin,et al. Cognitive and Psychometric Analysis of Analogical Problem Solving , 1990 .
[5] Stephen Pulman,et al. Using the Framework , 1996 .
[6] Yaroslav Fyodorov,et al. A Natural Logic Inference System , 2000 .
[7] Siobhan Chapman. Logic and Conversation , 2005 .
[8] Martha Palmer,et al. Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .
[9] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[10] David R. Thomas,et al. A General Inductive Approach for Analyzing Qualitative Evaluation Data , 2006 .
[11] Jan-Willem Strijbos,et al. Content analysis: What are they talking about? , 2006, Comput. Educ..
[12] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[13] Dan Roth,et al. “Ask Not What Textual Entailment Can Do for You...” , 2010, ACL.
[14] Alexander Yates,et al. Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment , 2011, ACL.
[15] Johan Bos,et al. Developing a large semantically annotated corpus , 2012, LREC.
[16] Saif Mohammad,et al. SemEval-2012 Task 2: Measuring Degrees of Relational Similarity , 2012, *SEMEVAL.
[17] M. McHugh. Interrater reliability: the kappa statistic , 2012, Biochemia medica.
[18] Ido Dagan,et al. Semantic Annotation for Textual Entailment Recognition , 2012, MICAI.
[19] Johan Bos,et al. The Groningen Meaning Bank , 2013, JSSP.
[20] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[21] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.
[22] Kevin Duh,et al. Inference is Everything: Recasting Semantic Resources into a Unified Evaluation Framework , 2017, IJCNLP.
[23] Aaron Steven White,et al. The role of veridicality and factivity in clause selection * , 2017 .
[24] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.
[25] Carolyn Penstein Rosé,et al. Stress Test Evaluation for Natural Language Inference , 2018, COLING.
[26] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[27] Yoav Goldberg,et al. Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.
[28] A. Joubert,et al. The JeuxDeMots Project is 10 Years Old: What We have Learned , 2018 .
[29] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[30] Rachel Rudinger,et al. Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation , 2018, BlackboxNLP@EMNLP.
[31] Shachar Mirkin,et al. Listening Comprehension over Argumentative Content , 2018, EMNLP.
[32] Rachel Rudinger,et al. Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.
[33] Percy Liang,et al. Transforming Question Answering Datasets Into Natural Language Inference Datasets , 2018, ArXiv.
[34] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[35] Siddharth Patwardhan,et al. Annotating Electronic Medical Records for Question Answering , 2018, ArXiv.
[36] Masatoshi Tsuchiya,et al. Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment , 2018, LREC.
[37] Andreas Vlachos,et al. FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.
[38] Peter Clark. What Knowledge is Needed to Solve the RTE5 Textual Entailment Challenge? , 2018, ArXiv.
[39] Christopher Potts,et al. Stress-Testing Neural Models of Natural Language Inference with Multiply-Quantified Sentences , 2018, ArXiv.
[40] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[41] Yoav Goldberg,et al. Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets , 2019, EMNLP.
[42] Jean-Philippe Bernardy,et al. What Kind of Natural Language Inference are NLP Systems Learning: Is this Enough? , 2019, ICAART.
[43] Carolyn Penstein Rosé,et al. EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference , 2019, CoNLL.
[44] Johan Bos,et al. HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning , 2019, *SEMEVAL.
[45] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.
[46] Ido Dagan,et al. Diversify Your Datasets: Analyzing Generalization via Controlled Variance in Adversarial Datasets , 2019, CoNLL.
[47] Mohit Bansal,et al. Analyzing Compositionality-Sensitivity of NLI Models , 2018, AAAI.
[48] Ellie Pavlick,et al. Inherent Disagreements in Human Textual Inferences , 2019, Transactions of the Association for Computational Linguistics.
[49] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[50] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[51] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.
[52] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[53] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[54] Grzegorz Chrupala,et al. Analyzing and interpreting neural networks for NLP: A report on the first BlackboxNLP workshop , 2019, Natural Language Engineering.
[55] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[56] Yuxing Chen,et al. Harnessing the linguistic signal to predict scalar inferences , 2019, ACL.
[57] Adina Williams,et al. Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition , 2020, ACL.
[58] Eduardo Blanco,et al. An Analysis of Natural Language Inference Benchmarks through the Lens of Negation , 2020, EMNLP.
[59] Xiang Zhou,et al. What Can We Learn from Collective Human Opinions on Natural Language Inference Data? , 2020, EMNLP.
[60] Julian Michael,et al. AmbigQA: Answering Ambiguous Open-domain Questions , 2020, EMNLP.
[61] Noah A. Smith,et al. Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.
[62] Yejin Choi,et al. Commonsense Reasoning for Natural Language Processing , 2020, ACL.
[63] Yejin Choi,et al. Thinking Like a Skeptic: Defeasible Inference in Natural Language , 2020, FINDINGS.
[64] Ashish Sabharwal,et al. Probing Natural Language Inference Models through Semantic Fragments , 2019, AAAI.
[65] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[66] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[67] Samuel R. Bowman,et al. Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options , 2020, AACL.
[68] Hao Tan,et al. The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions , 2020, EMNLP.
[69] J. Weston,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.
[70] Adam Poliak,et al. A survey on Recognizing Textual Entailment as an NLP Evaluation , 2020, EVAL4NLP.
[71] Benjamin Van Durme,et al. Uncertain Natural Language Inference , 2019, ACL.
[72] Mohit Bansal,et al. ConjNLI: Natural Language Inference over Conjunctive Sentences , 2020, EMNLP.
[73] Doug Downey,et al. Abductive Commonsense Reasoning , 2019, ICLR.
[74] Samuel R. Bowman,et al. Collecting Entailment Data for Pretraining: New Protocols and Negative Results , 2020, EMNLP.
[75] Samuel R. Bowman,et al. Collecting Entailment Data for Pretraining: New Protocols and Negative Results , 2020, EMNLP.
[76] Tal Linzen,et al. COGS: A Compositional Generalization Challenge Based on Semantic Interpretation , 2020, EMNLP.
[77] Samuel R. Bowman,et al. BLiMP: A Benchmark of Linguistic Minimal Pairs for English , 2019, SCIL.
[78] Christopher Potts,et al. Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation , 2020, BLACKBOXNLP.
[79] Yejin Choi,et al. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics , 2020, EMNLP.
[80] Jianfeng Gao,et al. DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.
[81] Joelle Pineau,et al. UnNatural Language Inference , 2020, ACL.
[82] Jordan Boyd-Graber,et al. Evaluation Examples are not Equally Informative: How should that change NLP Leaderboards? , 2021, ACL.
[83] Hanna M. Wallach,et al. Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets , 2021, ACL.
[84] Douwe Kiela,et al. Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little , 2021, EMNLP.
[85] Zhiyi Ma,et al. Dynabench: Rethinking Benchmarking in NLP , 2021, NAACL.
[86] Clara Vania,et al. What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks? , 2021, ACL.
[87] Robin Jia,et al. Analyzing Dynamic Adversarial Training Data in the Limit , 2021, ArXiv.
[88] Zhiyi Ma,et al. Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking , 2021, NeurIPS.
[89] Samuel R. Bowman,et al. Does Putting a Linguist in the Loop Improve NLU Data Collection? , 2021, EMNLP.