How much data is needed for reliable MT evaluation? Using bootstrapping to study human and automatic metrics
暂无分享,去创建一个
Andrei Popescu-Belis | Paula Estrella | Olivier Hamon | Andrei Popescu-Belis | Paula Estrella | O. Hamon
[1] Jimmy J. Lin,et al. Web question answering: is more always better? , 2002, SIGIR '02.
[2] Alexander H. Waibel,et al. Low Cost Portability for Statistical Machine Translation based on N-gram Frequency and TF-IDF , 2005, IWSLT.
[3] Charles L. A. Clarke,et al. The impact of corpus size on question answering performance , 2002, SIGIR '02.
[4] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[5] Hermann Ney,et al. An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.
[6] Cyril Goutte. Automatic Evaluation of Machine Translation Quality , 2006 .
[7] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.
[8] Joseph P. Turian,et al. Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.
[9] Ying Zhang,et al. Measuring confidence intervals for the machine translation evaluation metrics , 2004, TMI.
[10] Ulrich Germann. Building a Statistical Machine Translation System from Scratch: How Much Bang for the Buck Can We Expect? , 2001, DDMMT@ACL.
[11] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.
[12] Eric Atwell,et al. Rationale for a multilingual corpus for machine translation evaluation , 2003 .
[13] Hermann Ney,et al. Accelerated DP based search for statistical translation , 1997, EUROSPEECH.
[14] Alex Waibel,et al. Low Cost Portability for Statistical Machine Translation based on N-gram Coverage , 2005 .
[15] B. Efron,et al. A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .
[16] Paula Estrella,et al. Finding the System that Suits You Best: Towards the Normalization of MT Evaluation , 2005, TC.
[17] Andrei Popescu-Belis,et al. CESTA: First Conclusions of the Technolangue MT Evaluation Campaign , 2006, LREC.
[18] Hermann Ney,et al. Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[19] Shankar Kumar,et al. Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.