论文信息 - A Discriminative Latent Variable-Based "DE" Classifier for Chinese-English SMT

A Discriminative Latent Variable-Based "DE" Classifier for Chinese-English SMT

Syntactic reordering on the source-side is an effective way of handling word order differences. The (DE) construction is a flexible and ubiquitous syntactic structure in Chinese which is a major source of error in translation quality. In this paper, we propose a new classifier model --- discriminative latent variable model (DPLVM) --- to classify the DE construction to improve the accuracy of the classification and hence the translation quality. We also propose a new feature which can automatically learn the reordering rules to a certain extent. The experimental results show that the MT systems using the data reordered by our proposed model outperform the baseline systems by 6.42% and 3.08% relative points in terms of the BLEU score on PB-SMT and hierarchical phrase-based MT respectively. In addition, we analyse the impact of DE annotation on word alignment and on the SMT phrase table.

Andy Way | Jinhua Du

[1] Xu Sun,et al. Sequential Labeling with Latent Variables: An Exact Inference Algorithm and its Efficient Approximation , 2009, EACL.

[2] Trevor Darrell,et al. Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Daniel Jurafsky,et al. Disambiguating “DE” for Chinese-English Machine Translation , 2009, WMT@EACL.

[4] Fei Xia,et al. Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[5] Philipp Koehn,et al. Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[6] Andy Way,et al. The Impact of Source–Side Syntactic Reordering on Hierarchical Phrase-based SMT , 2010, EAMT.

[7] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[8] Ying Zhang,et al. Measuring confidence intervals for the machine translation evaluation metrics , 2004, TMI.

[9] David Chiang,et al. A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[10] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[11] Ming Zhou,et al. A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation , 2007, ACL.

[12] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13] Andreas Zollmann,et al. Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[14] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[15] Chao Wang,et al. Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[16] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[17] Roger Levy,et al. Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.