论文信息 - Integrating pronunciation into Chinese-Vietnamese statistical machine translation

Integrating pronunciation into Chinese-Vietnamese statistical machine translation

Statistical machine translation for low-resource language suffers from the lack of abundant training corpora. Several methods, such as the use of a pivot language, have been proposed as a bridge to translate from one language to another. However, errors will accumulate during the extensive translation pipelines. In this paper, we propose an approach to low-resource language translation by exploiting the pronunciation correlations between languages. We find that the pronunciation features can improve both Chinese-Vietnamese and VietnameseChinese translation qualities. Experimental results show that our proposed model yields effective improvements, and the translation performance (bilingual evaluation understudy score) is improved by a maximum value of 1.03.

[1] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[2] Tiejun Zhao,et al. Improving Pivot-Based Statistical Machine Translation Using Random Walk , 2013, EMNLP.

[3] Petya Osenova,et al. Factored models for Deep Machine Translation , 2015, DMTW.

[4] Petya Osenova,et al. Linguistically-Augmented Bulgarian-to-English Statistical Machine Translation Model , 2012, ESIRMT/HyTra@EACL.

[5] Hai Zhao,et al. Vietnamese to Chinese Machine Translation via Chinese Character as Pivot , 2013, PACLIC.

[6] F. Pellegrino,et al. A Quantitative and Typological Approach to Correlating Linguistic Complexity , 2013 .

[7] Philip Koehn,et al. Statistical Machine Translation , 2010, EAMT.

[8] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9] Philipp Koehn,et al. CCG Supertags in Factored Statistical Machine Translation , 2007, WMT@ACL.

[10] Philipp Koehn,et al. Factored Translation Models , 2007, EMNLP.

[11] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.