Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining
暂无分享,去创建一个
[1] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[2] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.
[3] Mitesh M. Khapra,et al. Towards Exploiting Background Knowledge for Building Conversation Systems , 2018, EMNLP.
[4] Mamoru Komachi,et al. Machine Translation Evaluation with BERT Regressor , 2019, ArXiv.
[5] Jianfeng Gao,et al. deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets , 2015, ACL.
[6] Aditya Sharma,et al. Towards Understanding the Geometry of Knowledge Graph Embeddings , 2018, ACL.
[7] Jason Weston,et al. Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.
[8] Hannes Schulz,et al. Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation , 2017, ArXiv.
[9] Maxine Eskénazi,et al. Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References , 2019, SIGdial.
[10] Jörg Tiedemann,et al. Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.
[11] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[13] Vasile Rus,et al. A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, BEA@NAACL-HLT.
[14] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[15] Alan Ritter,et al. Unsupervised Modeling of Twitter Conversations , 2010, NAACL.
[16] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.
[17] Gunhee Kim,et al. A Hierarchical Latent Structure for Variational Conversation Modeling , 2018, NAACL.
[18] Matthew Henderson,et al. A Repository of Conversational Datasets , 2019, Proceedings of the First Workshop on NLP for Conversational AI.
[19] Xiaoyu Shen,et al. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.
[20] Jianfeng Gao,et al. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.
[21] Joelle Pineau,et al. Bootstrapping Dialog Systems with Word Embeddings , 2014 .
[22] Kevin Gimpel,et al. Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.
[23] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[24] Joelle Pineau,et al. Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.
[25] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.
[26] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.
[27] Alan Ritter,et al. Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.
[28] Eric N. Forsyth. Improving automated lexical and discourse analysis of online chat dialog , 2007 .
[29] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[30] Sungjin Lee,et al. Jointly Optimizing Diversity and Relevance in Neural Response Generation , 2019, NAACL.
[31] Nanyun Peng,et al. Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings , 2019, Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation.
[32] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[33] Zhong Zhou,et al. Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.
[34] Jianfeng Gao,et al. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.
[35] Mitesh M. Khapra,et al. Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses , 2019, AAAI.
[36] Joelle Pineau,et al. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.
[37] Jason Weston,et al. What makes a good conversation? How controllable attributes affect human judgments , 2019, NAACL.
[38] Craig H. Martell,et al. Lexical and Discourse Analysis of Online Chat Dialog , 2007, International Conference on Semantic Computing (ICSC 2007).
[39] Dongyan Zhao,et al. RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems , 2017, AAAI.