Know Thy Strengths: Comprehensive Dialogue State Tracking Diagnostics
暂无分享,去创建一个
Kaushik Ram Sadagopan | Asli Celikyilmaz | Jonathan May | Chinnadhurai Sankar | Hyundong Justin Cho | Christopher Lin | Shahin Shayandeh | Ahmad Beirami
[1] Jeffrey P. Bigham,et al. InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning , 2022, EMNLP.
[2] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[3] Min-Yen Kan,et al. Interpreting the Robustness of Neural NLP Models to Textual Perturbations , 2021, FINDINGS.
[4] Elman Mansimov,et al. Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System , 2021, ACL.
[5] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[6] Baolin Peng,et al. Soloist: Building Task Bots at Scale with Transfer Learning and Machine Teaching , 2021, Transactions of the Association for Computational Linguistics.
[7] Zhouhan Lin,et al. Annotation Inconsistency and Entity Bias in MultiWOZ , 2021, SIGDIAL.
[8] Jay Pujara,et al. Probing Commonsense Explanation in Dialogue Response Generation , 2021, EMNLP.
[9] Noah A. Smith,et al. Competency Problems: On Finding and Removing Artifacts in Language Data , 2021, EMNLP.
[10] Sonal Gupta,et al. Muppet: Massive Multi-task Representations with Pre-Finetuning , 2021, EMNLP.
[11] Hongguang Li,et al. Robustness Testing of Language Understanding in Task-Oriented Dialog , 2020, ACL.
[12] Jianfeng Gao,et al. RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems , 2020, ACL.
[13] Dilek Z. Hakkani-Tür,et al. Overview of the Ninth Dialog System Technology Challenge: DSTC9 , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[14] Caiming Xiong,et al. CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers , 2020, ICLR.
[15] Minlie Huang,et al. MultiWOZ 2.3: A Multi-domain Task-Oriented Dialogue Dataset Enhanced with Annotation Corrections and Co-Reference Annotation , 2020, NLPCC.
[16] Dilek Z. Hakkani-Tür,et al. DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue , 2020, ArXiv.
[17] Emily M. Bender,et al. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data , 2020, ACL.
[18] Sameer Singh,et al. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.
[19] Carel van Niekerk,et al. TripPy: A Triple Copy Strategy for Value Independent Neural Dialog State Tracking , 2020, SIGDIAL.
[20] Aditi Raghunathan,et al. Robust Encodings: A Framework for Combating Adversarial Typos , 2020, ACL.
[21] R. Socher,et al. A Simple Language Model for Task-Oriented Dialogue , 2020, Neural Information Processing Systems.
[22] Ryan McDonald,et al. On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.
[23] Seung-won Hwang,et al. SQuAD2-CR: Semi-supervised Annotation for Cause and Rationales for Unanswerability in SQuAD 2.0 , 2020, LREC.
[24] Jianfeng Gao,et al. Few-shot Natural Language Generation for Task-Oriented Dialog , 2020, FINDINGS.
[25] Fabio Petroni,et al. How Decoding Strategies Affect the Verifiability of Generated Text , 2019, FINDINGS.
[26] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[27] Changjian Hu,et al. GECOR: An End-to-End Generative Ellipsis and Co-reference Resolution Model for Task-Oriented Dialogue , 2019, EMNLP.
[28] Raghav Gupta,et al. Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2019, AAAI.
[29] Percy Liang,et al. Distributionally Robust Language Modeling , 2019, EMNLP.
[30] Aditi Raghunathan,et al. Certified Robustness to Adversarial Word Substitutions , 2019, EMNLP.
[31] Luke Zettlemoyer,et al. Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases , 2019, EMNLP.
[32] Bill Byrne,et al. Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset , 2019, EMNLP.
[33] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.
[34] Qi Liu,et al. Multi-Task Self-Supervised Learning for Disfluency Detection , 2019, AAAI.
[35] Anuj Kumar Goyal,et al. MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines , 2019, LREC.
[36] Christopher Joseph Pal,et al. Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study , 2019, ACL.
[37] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[38] Danqi Chen,et al. CoQA: A Conversational Question Answering Challenge , 2018, TACL.
[39] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[40] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[41] Richard Socher,et al. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.
[42] Jason Weston,et al. ParlAI: A Dialog Research Software Platform , 2017, EMNLP.
[43] Radha Poovendran,et al. Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.
[44] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[45] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[46] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[47] Alborz Geramifard,et al. DAIR: Data Augmented Invariant Regularization , 2021, ArXiv.
[48] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[49] Hongbo Zhang,et al. Quora Question Pairs , 2017 .
[50] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.