论文信息 - LIMSI $@$ WMT'14 Medical Translation Task

LIMSI $@$ WMT'14 Medical Translation Task

This paper describes LIMSI’s submission to the first medical translation task at WMT’14. We report results for EnglishFrench on the subtask of sentence translation from summaries of medical articles. Our main submission uses a combination of NCODE (n-gram-based) and MOSES (phrase-based) output and continuous-space language models used in a post-processing step for each system. Other characteristics of our submission include: the use of sampling for building MOSES’ phrase table; the implementation of the vector space model proposed by Chen et al. (2013); adaptation of the POStagger used by NCODE to the medical domain; and a report of error analysis based on the typology of Vilar et al. (2006).

[1] José B. Mariño,et al. Ncode: an Open Source Bilingual N-gram SMT Toolkit , 2011, Prague Bull. Math. Linguistics.

[2] George F. Foster,et al. Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[3] Hermann Ney,et al. Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[4] José B. Mariño,et al. Improving statistical MT by coupling reordering and decoding , 2006, Machine Translation.

[5] François Yvon,et al. Practical Very Large Scale CRFs , 2010, ACL.

[6] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7] Alex Acero,et al. Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..

[8] Adam Lopez. Tera-Scale Translation Models via Pattern Matching , 2008, COLING.

[9] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[10] Chris Callison-Burch,et al. Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases , 2005, ACL.

[11] Alexandre Allauzen,et al. Structured Output Layer neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Alon Lavie,et al. Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[13] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[14] Thomas C. Rindflesch,et al. MedPost: a part-of-speech tagger for bioMedical text , 2004, Bioinform..

[15] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[16] Roland Kuhn,et al. Vector Space Model for Adaptation in Statistical Machine Translation , 2013, ACL.

[17] José B. Mariño,et al. N-gram-based Machine Translation , 2006, CL.

[18] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[19] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[20] Alexandre Allauzen,et al. Limsi @ Wmt12 , 2012, WMT@NAACL-HLT.

[21] Olivier Galibert,et al. Limsi’s Statistical Translation Systems for WMT‘08 , 2008, WMT@ACL.

[22] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[23] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.