Parse and Corpus-Based Machine Translation

In this paper the PaCo-MT project is described, in which Parse and Corpus-based Machine Translation has been investigated: a data-driven approach to stochastic syntactic rule-based machine translation.In contrast to the phrase-based statistical machine translation systems (PB-SMT) which are string-based and do not use any linguistic knowledge, an MT engine in a different paradigm was built: a tree-based data-driven system that automatically induces translation rules from a large syntactically analysed parallelcorpus. The architecture is presented in detail as well as an evaluation in comparison with our previous work and with the current state-of-the art PB-SMT system Moses.

[1]  Yvette Graham Sulis: An Open Source Transfer Decoder for Deep Syntactic Statistical Machine Translation , 2010, Prague Bull. Math. Linguistics.

[2]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[3]  Jan Bebbington,et al.  Seeing the wood for the trees , 1999 .

[4]  Stella Markantonatou,et al.  METIS-II: low resource machine translation , 2008, Machine Translation.

[5]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[6]  Gertjan van Noord,et al.  At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[7]  Gideon Kotzé Rule-induced error correction of aligned parallel treebanks , 2011 .

[8]  Alon Lavie,et al.  MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules , 2002, Machine Translation.

[9]  Martin Volk,et al.  Using the Stockholm TreeAligner , 2007 .

[10]  Vincent Vandeghinste,et al.  Scaling up a Hybrid MT System: From low to full resources , 2021, Linguistica Antverpiensia, New Series – Themes in Translation Studies.

[11]  Daniel Marcu,et al.  Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation , 2010, CL.

[12]  Josef van Genabith,et al.  Deep Syntax Language Models and Statistical Machine Translation , 2010, SSST@COLING.

[13]  Gideon Kotzé Finding statistically motivated features influencing subtree alignment performance , 2011, NODALIDA.

[14]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[15]  Andy Way,et al.  Capturing translational divergences with a statistical tree-to-tree aligner , 2007 .

[16]  Rens Bod,et al.  A Computational Model of Language Performance: Data Oriented Parsing , 1992, COLING.

[17]  Geoffrey Leech,et al.  CLAWS4: The Tagging of the British National Corpus , 1994, COLING.

[18]  Gideon Kotzé Improving syntactic tree alignment through rule-based error correction , 2011 .

[19]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[20]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[21]  Vincent Vandeghinste,et al.  A Hybrid Modular Machine Translation System , 2008 .

[22]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[23]  Stefan Riezler,et al.  Grammatical Machine Translation , 2006, NAACL.

[24]  Josef van Genabith,et al.  Dependency-Based N-Gram Models for General Purpose Sentence Realisation , 2008, COLING.

[25]  Richard Edwin Stearns,et al.  Syntax-Directed Transduction , 1966, JACM.

[26]  Alfred V. Aho,et al.  Syntax Directed Translations and the Pushdown Assembler , 1969, J. Comput. Syst. Sci..

[27]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[28]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[29]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[30]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[31]  Aravind K. Joshi,et al.  Mathematical and computational aspects of lexicalized grammars , 1990 .

[32]  Andy Way,et al.  Seeing the wood for the trees: data-oriented translation , 2003, MTSUMMIT.

[33]  Stephan Oepen,et al.  Statistical Ranking in Tactical Generation , 2006, EMNLP.

[34]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[35]  Christian Boitet,et al.  Ambiguities and ambiguity labelling: Towards ambiguity data bases , 1997 .

[36]  Josef van Genabith,et al.  Factor templates for factored machine translation models , 2010, IWSLT.

[37]  Vincent Vandeghinste,et al.  Tree-Based Target Language Modeling , 2009, EAMT.

[38]  David Chiang,et al.  An Introduction to Synchronous Grammars , 2006 .

[39]  Jörg Tiedemann,et al.  A Discriminative Approach to Tree Alignment , 2009 .

[40]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[41]  Vincent Vandeghinste Removing the distinction between a translation memory, a bilingual dictionary and a parallel corpus , 2007 .

[42]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[43]  Jörg Tiedemann Lingua-Align: An Experimental Toolbox for Automatic Tree-to-Tree Alignment , 2010, LREC.

[44]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[45]  Heidi Fox,et al.  Phrasal Cohesion and Statistical Machine Translation , 2002, EMNLP.

[46]  Alon Lavie,et al.  Stat-XFER: A General Search-Based Syntax-Driven Framework for Machine Translation , 2008, CICLing.

[47]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[48]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[49]  Vincent Vandeghinste,et al.  Top-down Transfer in Example-based MT , 2009 .

[50]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[51]  Arjen Poutsma Data-Oriented Translation , 2000, COLING.

[52]  András Kornai,et al.  Parallel corpora for medium density languages , 2007 .

[53]  J. Tiedemann,et al.  Proceedings of TLT-8 , 2009 .

[54]  Andy Way,et al.  Supertagged Phrase-Based Statistical Machine Translation , 2007, ACL.

[55]  Vincent Vandeghinste,et al.  Bottom-up Transfer in Example-based Machine Translation , 2010, EAMT.

[56]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[57]  R. J. Lickley,et al.  Proceedings of the International Conference on Spoken Language Processing. , 1992 .

[58]  Vincent Vandeghinste,et al.  An Efficient, Generic Approach to Extracting Multi-Word Expressions from Dependency Trees , 2010, MWE@COLING.

[59]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[60]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.