Deep Syntax in Statistical Machine Translation

Statistical Machine Translation (SMT) via deep syntactic transfer employs a three-stage architecture, (i) parse source language (SL) input, (ii) transfer SL deep syntactic structure to the target language (TL), and (iii) generate a TL translation. The deep syntactic transfer architecture achieves a high level of language pair independence compared to other Machine Translation (MT) approaches, as translation is carried out at the more language independent deep syntactic representation. TL word order can be generated independently of SL word order and therefore no reordering model between source and target words is required. In addition, words in dependency relations are adjacent in the deep syntactic structure, allowing the extraction of more general transfer rules, compared to other rules/phrases extracted from the surface form corpus, as such words are often distant in surface form strings, as well as allowing the use of a TL deep syntax language model, which models a deeper notion of fluency than a string-based language model and may lead to better lexical choice. The deep syntactic representation also contains words in lemma form with morpho-syntactic information, and this enables new inflections of lemmas not observed in bilingual training data, that are out of coverage for other SMT approaches, to fall within coverage of deep syntactic transfer. In this thesis, we adapt existing methods already successful in Phrase-Based SMT (PB-SMT) to deep syntactic transfer as well as presenting new methods of our own. We present a new definition for consistent deep syntax transfer rules, inspired by the definition for a consistent phrase in PB-SMT, and we extract all rules consistent with the node alignment, as smaller rules provide high coverage of unseen data, while larger rules provide more fluent combinations of TL words. Since large numbers of consistent transfer rules exist per sentence pair, we also provide an efficient method of extracting rules as well as an efficient method of storing them. We also present a deep syntax translation model, as in other SMT approaches, we use a log-linear combination of features functions, and include a translation model computed from relative frequencies of transfer rules, lexical weighting, as well as a deep syntax language model and string-based language model. In addition, we describe methods of carrying out transfer decoding, the search for TL deep syntactic structures, and how we efficiently integrate a deep syntax trigram language model to decoding, as well as methods of translating morpho-syntactic information separately from lemmas, using an adaptation of Factored Models. Finally, we include an experimental evaluation, in which we compare MT output for different configurations of our SMT via deep syntactic transfer system. We investigate various methods of word alignment, methods of translating morpho-syntactic information, limits on transfer rule size, different beam sizes during transfer decoding, generating from different sized lists of TL decoder output structures, as well as deterministic versus non-deterministic generation. We also include an evaluation of the deep syntax language model in isolation to the MT system and compare it to a string-based language model. Finally, we compare the performance and types of translations our system produces with a state-of-the-art phrase-based statistical machine translation system and although the deep syntax system in general currently under-performs, it does achieve state-of-the-art performance for translation of a specific syntactic construction, the compound noun, and for translations within coverage of the TL precision grammar used for generation. We provide the software for transfer rule extraction, as well as the transfer decoder, as open source tools to assist future research.

[1]  Yvette Graham Sulis: An Open Source Transfer Decoder for Deep Syntactic Statistical Machine Translation , 2010, Prague Bull. Math. Linguistics.

[2]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[3]  Sanjeev Khudanpur,et al.  Decoding in Joshua: Open Source, Parsing-Based Machine Translation , 2009, Prague Bull. Math. Linguistics.

[4]  Andreas Eisele First Steps towards Multi-Engine Machine Translation , 2005, ParallelText@ACL.

[5]  Josef van Genabith,et al.  Factor templates for factored machine translation models , 2010, IWSLT.

[6]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[7]  Tracy Holloway King,et al.  Adapting Existing Grammars: The XLE Experience , 2002, COLING 2002.

[8]  J. Bresnan Lexical-Functional Syntax , 2000 .

[9]  Anton Bryl,et al.  F-structure transfer-based statistical machine translation , 2009 .

[10]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[11]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[12]  Omar Zaidan,et al.  Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems , 2009, Prague Bull. Math. Linguistics.

[13]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[14]  Ondrej Bojar,et al.  Phrase-Based and Deep Syntactic English-to-Czech Statistical Machine Translation , 2008, WMT@ACL.

[15]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[16]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[17]  Josef van Genabith,et al.  Packed rules for automatic transfer-rule induction , 2008, EAMT.

[18]  Josef van Genabith,et al.  An Open Source Rule Induction Tool for Transfer-Based SMT , 2009, Prague Bull. Math. Linguistics.

[19]  Daniel Marcu,et al.  What Can Syntax-Based MT Learn from Phrase-Based MT? , 2007, EMNLP.

[20]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[21]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[22]  Andreas Eisele,et al.  Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid System , 2008, WMT@ACL.

[23]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[24]  Stefan Riezler,et al.  Speed and Accuracy in Shallow and Deep Stochastic Parsing , 2004, NAACL.

[25]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[26]  Kevin Knight,et al.  Synchronous Tree Adjoining Machine Translation , 2009, EMNLP.

[27]  Mary Dalrymple,et al.  Lexical Functional Grammar , 2001 .

[28]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[29]  Mauro Cettolo,et al.  Reordering rules for phrase-based statistical machine translation , 2006, IWSLT.

[30]  Andreas Eisele,et al.  Multi-Engine Machine Translation with an Open-Source SMT Decoder , 2007, WMT@ACL.

[31]  Josef van Genabith,et al.  Deep Syntax Language Models and Statistical Machine Translation , 2010, SSST@COLING.

[32]  Stefan Riezler,et al.  Grammatical Machine Translation , 2006, NAACL.

[33]  Ronald M. Kaplan,et al.  Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .