论文信息 - Book Review: Syntax-Based Statistical Machine Translation by Philip Williams, Rico Sennrich, Matt Post and Philipp Koehn

Book Review: Syntax-Based Statistical Machine Translation by Philip Williams, Rico Sennrich, Matt Post and Philipp Koehn

In its early development, machine translation adopted rule-based approaches, which can include the use of language syntax. The late 1980s and early 1990s saw the inception of the statistical machine translation (SMT) approach, where translation models can be learned automatically from a parallel corpus rather than created manually by humans. Initial SMT models were word-based and phrase-based, without the use of syntactic knowledge. In phrase-based SMT, a source sentence is first segmented into phrases and then translated phrase-by-phrase with some reordering of the translated phrases in the target sentence. This has posed challenges when translating between two syntactically different languages. Syntax-based SMT approaches take advantage of syntactic knowledge within the framework of SMT. This book provides an introduction to syntax-based SMT approaches. It is a valuable resource for those who are interested in syntax-based SMT. The book consists of seven chapters. There is not an introduction chapter in this book, aside from the preface, which can be considered as a brief introduction. Readers are referred to Koehn (2010) for background knowledge. I think an introduction chapter categorized into sections would have been useful, before proceeding to describe the various models. The first two chapters provide principles applicable across various syntaxbased SMT approaches. The next three chapters describe syntax-based SMT decoding in detail; this constitutes half of the book. Selected extended topics are provided in the next chapter, which is followed by a concluding chapter. Chapter 1 describes the models and formalisms applicable to syntax-based SMT. The first section describes the phrasal translation units in phrase-based SMT, its limitations, and how tree structures address the limitations of the phrase-based approach. This explanation is useful as translation units are the key difference between the phrasebased and syntax-based SMT approaches. The next two sections describe the grammar formalisms and the statistical models that define syntax-based SMT. The section that covers the grammar formalisms (i.e., synchronous context-free grammar [SCFG] and synchronous tree-substitution grammar [STSG]), would have been clearer if their differences were presented in a side-by-side illustrating example. The remainder of the chapter discusses different categories of syntax-based SMT approaches and the history of these approaches, which include string-to-string, string-to-tree, tree-to-string, and

[1] Hermann Ney,et al. The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[2] Daniel Marcu,et al. Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[3] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[4] David Chiang,et al. Hierarchical Phrase-Based Translation , 2007, CL.

[5] Colin Cherry,et al. Cohesive Phrase-Based Decoding for Statistical Machine Translation , 2008, ACL.

[6] Daniel Marcu,et al. What’s in a translation rule? , 2004, NAACL.

[7] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[8] Daniel Jurafsky,et al. Discriminative Reordering with Chinese Grammatical Relations Features , 2009, SSST@HLT-NAACL.