A Discriminative Latent Variable-Based "DE" Classifier for Chinese-English SMT

Syntactic reordering on the source-side is an effective way of handling word order differences. The (DE) construction is a flexible and ubiquitous syntactic structure in Chinese which is a major source of error in translation quality. In this paper, we propose a new classifier model --- discriminative latent variable model (DPLVM) --- to classify the DE construction to improve the accuracy of the classification and hence the translation quality. We also propose a new feature which can automatically learn the reordering rules to a certain extent. The experimental results show that the MT systems using the data reordered by our proposed model outperform the baseline systems by 6.42% and 3.08% relative points in terms of the BLEU score on PB-SMT and hierarchical phrase-based MT respectively. In addition, we analyse the impact of DE annotation on word alignment and on the SMT phrase table.

[1]  Xu Sun,et al.  Sequential Labeling with Latent Variables: An Exact Inference Algorithm and its Efficient Approximation , 2009, EACL.

[2]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Daniel Jurafsky,et al.  Disambiguating “DE” for Chinese-English Machine Translation , 2009, WMT@EACL.

[4]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[5]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[6]  Andy Way,et al.  The Impact of Source–Side Syntactic Reordering on Hierarchical Phrase-based SMT , 2010, EAMT.

[7]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[8]  Ying Zhang,et al.  Measuring confidence intervals for the machine translation evaluation metrics , 2004, TMI.

[9]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[10]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[11]  Ming Zhou,et al.  A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation , 2007, ACL.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[14]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[15]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[16]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[17]  Roger Levy,et al.  Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.