Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
暂无分享,去创建一个
Haizhou Li | Rafael E. Banchs | Chiori Hori | Luis Fernando D'Haro | Chiori Hori | Haizhou Li | L. F. D’Haro
[1] Haizhou Li,et al. AM-FM: A Semantic Framework for Translation Quality Assessment , 2011, ACL.
[2] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.
[3] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[4] Alois Knoll,et al. Comparing Objective and Subjective Measures of Usability in a Human-Robot Dialogue System , 2009, ACL.
[5] Y-Lan Boureau,et al. Overview of the sixth dialog system technology challenge: DSTC6 , 2019, Comput. Speech Lang..
[6] Vasile Rus,et al. A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, BEA@NAACL-HLT.
[7] George D. C. Cavalcanti,et al. Combining sentence similarities measures to identify paraphrases , 2018, Comput. Speech Lang..
[8] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[9] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[11] Michael White,et al. Further Meta-Evaluation of Broad-Coverage Surface Realization , 2010, EMNLP.
[12] Philipp Koehn,et al. (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.
[13] Preslav Nakov,et al. Machine Translation Evaluation with Neural Networks , 2017, Comput. Speech Lang..
[14] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[15] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[16] Matthew Marge,et al. Evaluating Evaluation Methods for Generation in the Presence of Variation , 2005, CICLing.
[17] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[18] Haizhou Li,et al. Adequacy–Fluency Metrics: Evaluating MT in the Continuous Space Model Framework , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[19] Jianfeng Gao,et al. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.
[20] Marilyn A. Walker,et al. Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.
[21] Jürgen Schmidhuber,et al. LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.
[22] John S. White,et al. The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches , 1994, AMTA.
[23] Philipp Koehn,et al. Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.
[24] Robert Graham,et al. Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI) , 2000, Natural Language Engineering.
[25] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.
[26] Helen Hastie,et al. Metrics and Evaluation of Spoken Dialogue Systems , 2012 .
[27] Cristian Danescu-Niculescu-Mizil,et al. Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs , 2011, CMCL@ACL.
[28] Dunja Mladenic,et al. Constructing a Natural Language Inference dataset using generative neural networks , 2016, Comput. Speech Lang..
[29] David R. Traum,et al. Dialogues in Context: An Objective User-Oriented Evaluation Approach for Virtual Human Dialogue , 2010, LREC.
[30] Timothy Baldwin,et al. Accurate Evaluation of Segment-level Machine Translation Metrics , 2015, NAACL.
[31] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[32] Hannes Schulz,et al. Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation , 2017, ArXiv.
[33] Kallirroi Georgila,et al. Hybrid reinforcement/supervised learning for dialogue policies from COMMUNICATOR data , 2005 .
[34] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[35] Peter W. Foltz,et al. The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .
[36] Romain Laroche,et al. A methodology for turn-taking capabilities enhancement in Spoken Dialogue Systems using Reinforcement Learning , 2018, Comput. Speech Lang..
[37] Steve J. Young,et al. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.
[38] Joelle Pineau,et al. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.
[39] Peter W. Foltz,et al. An introduction to latent semantic analysis , 1998 .
[40] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.
[41] Miles Osborne,et al. The Edinburgh Twitter Corpus , 2010, HLT-NAACL 2010.
[42] Rafael E. Banchs. Movie-DiC: a Movie Dialogue Corpus for Research and Development , 2012, ACL.