Translation Algorithms by Means of Language Intersection

Synchronous rewriting grammars have been successfully exploited as translation models in machine translation applications. In this article we consider the problem of the design of translation algorithms based on synchronous context-free grammars. We revisit a methodology for the design of parsing algorithms based on the idea of language intersection, that has fully been developed in the parsing community, and show how to apply it in the design of translation algorithms. We argue that the intersection methodology above can also be viewed as a framework for the comparison of translation algorithms and for the formal analysis of their properties. On this line, we observe how superficially different translation algorithms that have been proposed in the literature can be viewed as special cases of application of the intersection methodology. We also use our framework to reformulate and improve some results already presented in the literature.

[1]  Stuart M. Shieber Unifying Synchronous Tree Adjoining Grammars and Tree Transducers via Bimorphisms , 2006, EACL.

[2]  S. Shieber RESTRICTING THE WEAK‐GENERATIVE CAPACITY OF SYNCHRONOUS TREE‐ADJOINING GRAMMARS , 1994, Comput. Intell..

[3]  Fred J. Maryanski,et al.  Properties of stochastic syntax-directed translation schemata , 1979, International Journal of Computer & Information Sciences.

[4]  Michael A. Arbib,et al.  An Introduction to Formal Language Theory , 1988, Texts and Monographs in Computer Science.

[5]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[6]  Shankar Kumar,et al.  A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation , 2003, NAACL.

[7]  Dan Klein,et al.  Parsing and Hypergraphs , 2001, IWPT.

[8]  Richard Edwin Stearns,et al.  Syntax-Directed Transduction , 1966, JACM.

[9]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[10]  Stuart M. Shieber,et al.  Synchronous Tree-Adjoining Grammars , 1990, COLING.

[11]  Giorgio Satta,et al.  Generalized Multitext Grammars , 2004, ACL.

[12]  I. Dan Melamed,et al.  Multitext Grammars and Synchronous Parsers , 2003, NAACL.

[13]  Daniel Gildea,et al.  Synchronous Binarization for Machine Translation , 2006, NAACL.

[14]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[15]  David J. Weir,et al.  Parsing Some Constrained Grammar Formalisms , 1993, Comput. Linguistics.

[16]  Arjen Poutsma Data-Oriented Translation , 2000, COLING.

[17]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[18]  Stuart M. Shieber,et al.  Synchronous Grammars as Tree Transducers , 2004, TAG+.

[19]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[20]  Bernard Lang,et al.  Towards a Uniform Formal Framework for Parsing , 1991 .

[21]  Giorgio Satta,et al.  Some Computational Complexity Results for Synchronous Context-Free Grammars , 2005, HLT/EMNLP.

[22]  I. Dan Melamed,et al.  Empirical Lower Bounds on the Complexity of Translational Equivalence , 2006, ACL.

[23]  Mark-Jan Nederhof,et al.  A General Technique to Train Language Models on Language Models , 2005, Computational Linguistics.

[24]  Pierre Boullier,et al.  Range Concatenation Grammars , 2000, IWPT.

[25]  Giorgio Satta,et al.  Probabilistic Parsing as Intersection , 2003, IWPT.

[26]  Mark Dras,et al.  A Meta-Level Grammar: Redefining Synchronous TAG for Translation and Paraphrase , 1999, ACL.

[27]  Kevin Knight,et al.  An Overview of Probabilistic Tree Transducers for Natural Language Processing , 2005, CICLing.

[28]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[29]  Giorgio Satta,et al.  Synchronous Models of Language , 1996, ACL.

[30]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[31]  Mark-Jan Nederhof,et al.  On the Complexity of Some Extensions of RCG Parsing , 2001, IWPT.

[32]  I. Dan Melamed,et al.  Statistical Machine Translation by Parsing , 2004, ACL.

[33]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[34]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[35]  Khalil Sima'an,et al.  Data-Oriented Parsing , 2003 .

[36]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[37]  Giorgio Satta,et al.  Factoring Synchronous Grammars by Sorting , 2006, ACL.

[38]  Alfred V. Aho,et al.  Syntax Directed Translations and the Pushdown Assembler , 1969, J. Comput. Syst. Sci..

[39]  Heidi Fox,et al.  Phrasal Cohesion and Statistical Machine Translation , 2002, EMNLP.

[40]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[41]  Daniel Gildea,et al.  Machine Translation as Lexicalized Parsing with Hooks , 2005, IWPT.

[42]  Bernard Lang,et al.  The Structure of Shared Forests in Ambiguous Parsing , 1989, ACL.

[43]  Srinivas Bangalore,et al.  Learning Dependency Translation Models as Collections of Finite-State Head Transducers , 2000, Computational Linguistics.

[44]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Dekai Wu,et al.  Machine Translation with a Stochastic Grammatical Channel , 1998, COLING-ACL.

[46]  Andy Way Machine translation using LFG-DOP , 2003 .

[47]  Kevin Knight,et al.  Tiburon: A Weighted Tree Automata Toolkit , 2006, CIAA.

[48]  Bernard Lang,et al.  RECOGNITION CAN BE HARDER THAN PARSING , 1994, Comput. Intell..

[49]  Gheorghe Paun,et al.  Regulated Rewriting in Formal Language Theory , 1989 .