deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets
暂无分享,去创建一个
Jianfeng Gao | Chris Quirk | Chris Brockett | William B. Dolan | Yangfeng Ji | Michael Auli | Margaret Mitchell | Alessandro Sordoni | Michel Galley | Jianfeng Gao | W. Dolan | Michael Auli | Chris Quirk | Chris Brockett | Michel Galley | Margaret Mitchell | Alessandro Sordoni | Yangfeng Ji
[1] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[2] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .
[3] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.
[4] Hong Sun,et al. Joint Learning of a Dual SMT System for Paraphrase Generation , 2012, ACL.
[5] Preslav Nakov,et al. Optimizing for Sentence-Level BLEU+1 Yields Short Translations , 2012, COLING.
[6] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[7] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.
[8] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.
[9] Jianfeng Gao,et al. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.
[10] Stephen E. Robertson,et al. GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .
[11] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[12] Deborah A. Coughlin,et al. Correlating automated and human assessments of machine translation quality , 2003, MTSUMMIT.
[13] Gregory A. Sanders,et al. The NIST 2008 Metrics for machine translation challenge—overview, methodology, metrics, and results , 2009, Machine Translation.
[14] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.
[15] Alan Ritter,et al. Data-Driven Response Generation in Social Media , 2011, EMNLP.
[16] Timothy Baldwin,et al. Accurate Evaluation of Segment-level Machine Translation Metrics , 2015, NAACL.
[17] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Daniel Marcu,et al. HyTER: Meaning-Equivalent Semantics for Translation Evaluation , 2012, NAACL.
[19] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.
[20] Timothy Baldwin,et al. Testing for Significance of Increased Correlation with Human Judgment , 2014, EMNLP.