A Finite-State Approach to Machine Translation

The problem of machine translation can be viewed as consisting of two subproblems (a) Lexical Selection and (b) Lexical Reordering. We propose stochastic finite-state models for these two subproblems in this paper. Stochastic finite-state models are efficiently learnable from data, effective for decoding and are associated with a calculus for composing models which allows for tight integration of constraints from various levels of language processing. We present a method for learning stochastic finite-state models for lexical choice and lexical reordering that are trained automatically from pairs of source and target utterances. We use this method to develop models for English-Japanese translation and present the performance of these models for translation on speech and text. We also evaluate the efficacy of such a translation model in the context of a call routing task of unconstrained speech utterances.

[1]  Enrique Vidal,et al.  Text and speech translation by means of subsequential transducers , 1996, Nat. Lang. Eng..

[2]  Srinivas Bangalore,et al.  Stochastic finite-state models for spoken language machine translation , 2000 .

[3]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[4]  Yaser Al-Onaizan,et al.  Translation with Finite-State Devices , 1998, AMTA.

[5]  Roberto Pieraccini,et al.  Non-deterministic stochastic language models for speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Mark-Jan Nederhof,et al.  Practical Experiments with Regular Approximation of Context-Free Languages , 1999, CL.

[7]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[10]  Michael Riley,et al.  Speech Recognition by Composition of Weighted Finite Automata , 1996, ArXiv.

[11]  Martin Kay,et al.  Regular Models of Phonological Rule Systems , 1994, CL.

[12]  Rebecca N. Wright,et al.  Finite-State Approximation of Phrase Structure Grammars , 1991, ACL.

[13]  Alon Lavie,et al.  A modular approach to spoken language translation for large domains , 1998, AMTA.

[14]  Allen L. Gorin,et al.  Generating semantically consistent inputs to a dialog manager , 1997, EUROSPEECH.

[15]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[16]  Enrique Vidal,et al.  Text speech translation by means of subsequential transducers , 1999 .

[17]  Srinivas Bangalore,et al.  Automatic Acquisition of Hierarchical Transduction Models for Machine Translation , 1998, COLING-ACL.

[18]  Srinivas Bangalore,et al.  Learning phrase-based head transduction models for translation of spoken utterances , 1998, ICSLP.

[19]  Emmanuel Roche,et al.  Finite state transducers: parsing free and frozen sentences , 1999 .

[20]  Roberto Pieraccini,et al.  Stochastic automata for language modeling , 1996, Comput. Speech Lang..

[21]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[22]  Giuseppe Riccardi,et al.  Stochastic language adaptation over time and state in natural spoken dialog systems , 2000, IEEE Trans. Speech Audio Process..

[23]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.