Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach

We describe a unified and coherent syntactic framework for supporting a semanticallyinformed syntactic approach to statistical machine translation. Semantically enriched syntactic tags assigned to the target-language training texts improved translation quality. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reported on the NIST 2009 Urdu-English translation task. This finding supports the hypothesis (posed by many researchers in the MT community, e.g., in DARPA GALE) that both syntactic and semantic information are critical for improving translation quality—and further demonstrates that large gains can be achieved for low-resource languages with different word order than English.

[1]  Scott Miller,et al.  A Novel Use of Statistical Parsing to Extract Information from Text , 2000, ANLP.

[2]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[3]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[4]  Stephan Vogel,et al.  An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT , 2007, NAACL.

[5]  Patrick Schone,et al.  Mining Wiki Resources for Multilingual Named Entity Recognition , 2008, ACL.

[6]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[7]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[8]  Christine D. Piatko,et al.  A Modality Lexicon and its use in Automatic Tagging , 2010, LREC.

[9]  Andreas Zollmann,et al.  Grammar based statistical MT on Hadoop: An end-to-end toolkit for large scale PSCFG based MT , 2009, Prague Bull. Math. Linguistics.

[10]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[11]  Sergei Nirenburg,et al.  Mood and modality: out of theory and into the fray , 2004, Nat. Lang. Eng..

[12]  Chris Callison-Burch,et al.  Integrating Output from Specialized Modules in Machine Translation: Transliterations in Joshua , 2010, Prague Bull. Math. Linguistics.

[13]  Kevin Knight,et al.  Relabeling Syntax Trees to Improve Syntax-Based Machine Translation Quality , 2006, HLT-NAACL.

[14]  Ka Cormier,et al.  Annual Meeting of the Linguistic Society of America , 2004 .

[15]  J. V. D. Auwera,et al.  Overlap between situational and epistemic modal marking , 2005 .

[16]  Dan Klein,et al.  Learning and Inference for Hierarchically Split PCFGs , 2007, AAAI.

[17]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[18]  Mark Steedman,et al.  Alternating Quantifier Scope in CCG , 1999, ACL.