Evaluating Dialogue Generation Systems via Response Selection
暂无分享,去创建一个
Kentaro Inui | Reina Akama | Jun Suzuki | Hiroki Ouchi | Shiki Sato | Kentaro Inui | Jun Suzuki | Hiroki Ouchi | Reina Akama | Shiki Sato
[1] Joelle Pineau,et al. Bootstrapping Dialog Systems with Word Embeddings , 2014 .
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Jörg Tiedemann,et al. OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora , 2018, LREC.
[4] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[5] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[6] Joelle Pineau,et al. Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.
[7] Zhoujun Li,et al. Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.
[8] Xiaoyu Shen,et al. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.
[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[10] Joelle Pineau,et al. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.
[11] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .
[12] Maxine Eskénazi,et al. Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.
[13] Vasile Rus,et al. A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, BEA@NAACL-HLT.
[14] Joelle Pineau,et al. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.
[15] Sanjeev Arora,et al. A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.
[16] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[17] Mitesh M. Khapra,et al. Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses , 2019, AAAI.
[18] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[19] Walter S. Lasecki,et al. DSTC7 Task 1: Noetic End-to-End Response Selection , 2019, Proceedings of the First Workshop on NLP for Conversational AI.
[20] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.