Better Synchronous Binarization for Machine Translation

Binarization of Synchronous Context Free Grammars (SCFG) is essential for achieving polynomial time complexity of decoding for SCFG parsing based machine translation systems. In this paper, we first investigate the excess edge competition issue caused by a left-heavy binary SCFG derived with the method of Zhang et al. (2006). Then we propose a new binarization method to mitigate the problem by exploring other alternative equivalent binary SCFGs. We present an algorithm that iteratively improves the resulting binary SCFG, and empirically show that our method can improve a string-to-tree statistical machine translations system based on the synchronous binarization method in Zhang et al. (2006) on the NIST machine translation evaluation tasks.

[1]  Daniel Marcu,et al.  Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy , 2007, EMNLP.

[2]  Giorgio Satta,et al.  Some Computational Complexity Results for Synchronous Context-Free Grammars , 2005, HLT/EMNLP.

[3]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[4]  Liang Huang Binarization, Synchronous Binarization, and Target-side Binarization , 2007, SSST@HLT-NAACL.

[5]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[6]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[7]  Louis W. Shapiro,et al.  Bootstrap Percolation, the Schröder Numbers, and the N-Kings Problem , 1991, SIAM J. Discret. Math..

[8]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[9]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[10]  Chin-Yew Lin,et al.  Better Binarization for the CKY Parsing , 2008, EMNLP.

[11]  Eugene Charniak,et al.  Edge-Based Best-First Chart Parsing , 1998, VLC@COLING/ACL.

[12]  David Ellis,et al.  Multilevel Coarse-to-Fine PCFG Parsing , 2006, NAACL.

[13]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[14]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[15]  Daniel Gildea,et al.  Synchronous Binarization for Machine Translation , 2006, NAACL.

[16]  Jun'ichi Tsujii,et al.  Iterative CKY Parsing for Probabilistic Context-Free Grammars , 2004, IJCNLP.

[17]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.