Are Automatic Metrics Robust and Reliable in Specic Machine Translation Tasks?
暂无分享,去创建一个
Francisco Casacuberta | Mara Chinea-Rios | Álvaro Peris | F. Casacuberta | Mara Chinea-Rios | Álvaro Peris
[1] Maja Popovic,et al. chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.
[2] Francisco Casacuberta,et al. Adapting Neural Machine Translation with Parallel Synthetic Data , 2017, WMT.
[3] Antonio Toral,et al. A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions , 2017, EACL.
[4] Joseph P. Turian,et al. Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.
[5] Dietrich Klakow,et al. Testing the correlation of word error rate and perplexity , 2002, Speech Commun..
[6] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[7] Khalil Sima'an,et al. Evaluating Word Order Recursively over Permutation-Forests , 2014, SSST@EMNLP.
[8] M. Tatsumi. Correlation between Automatic Evaluation Metric Scores, Post-Editing Speed, and Some Other Factors , 2009, MTSUMMIT.
[9] Arianna Bisazza,et al. Neural versus phrase-based MT quality: An in-depth analysis on English-German and English-French , 2018, Comput. Speech Lang..
[10] Timothy Baldwin,et al. Can machine translation systems be evaluated by the crowd alone , 2015, Natural Language Engineering.
[11] Mauro Cettolo,et al. Overview of the IWSLT 2017 Evaluation Campaign , 2017, IWSLT.
[12] Jörg Tiedemann,et al. Climbing Mont BLEU: The Strange World of Reachable High-BLEU Translations , 2016, EAMT.
[13] Hermann Ney,et al. Statistical Approaches to Computer-Assisted Translation , 2009, CL.
[14] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.
[15] Khalil Sima'an,et al. Alternative Objective Functions for Training MT Evaluation Metrics , 2017, ACL.
[16] Antonio Toral,et al. Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation , 2017, Prague Bull. Math. Linguistics.
[17] Phil D. Green,et al. From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition , 2004, INTERSPEECH.
[18] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.
[19] Quoc V. Le,et al. Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.
[20] Ondrej Bojar,et al. Results of the WMT17 Metrics Shared Task , 2017, WMT.
[21] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[22] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[23] Alon Lavie,et al. The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.
[24] Hermann Ney,et al. Accelerated DP based search for statistical translation , 1997, EUROSPEECH.
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[26] Philipp Koehn,et al. Six Challenges for Neural Machine Translation , 2017, NMT@ACL.
[27] Cyril Goutte. Automatic Evaluation of Machine Translation Quality , 2006 .
[28] John S. White,et al. The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches , 1994, AMTA.
[29] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[30] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.
[31] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.
[32] John R. Pierce,et al. Language and Machines: Computers in Translation and Linguistics , 1966 .
[33] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[34] Khalil Sima. Fitting Sentence Level Translation Evaluation with Many Dense Features , 2014 .