Stochastic Finite-State Models for Spoken Language Machine Translation

The problem of machine translation can be viewed as consisting of twosubproblems (a) lexical selection and (b) lexical reordering. In thispaper, we propose stochastic finite-state models for these two subproblems. Stochastic finite-state models are efficiently learnablefrom data, effective for decoding and are associated with a calculusfor composing models which allows for tight integration of constraintsfrom various levels of language processing. We present a method forlearning stochastic finite-state models for lexical selection andlexical reordering that are trained automatically from pairs of sourceand target utterances. We use this method to develop models forEnglish–Japanese and English–SPANISH translation and present the performance of these models for translation on speech and text. We also evaluate the efficacy of such a translation model in the context of a call routing task of unconstrained speech utterances.

[1]  EstimationPeter,et al.  The Mathematics of Machine Translation : Parameter , 2004 .

[2]  Srinivas Bangalore,et al.  Learning phrase-based head transduction models for translation of spoken utterances , 1998, ICSLP.

[3]  Alon Lavie,et al.  A modular approach to spoken language translation for large domains , 1998, AMTA.

[4]  Giuseppe Riccardi,et al.  Stochastic language adaptation over time and state in natural spoken dialog systems , 2000, IEEE Trans. Speech Audio Process..

[5]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[6]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[7]  Roberto Pieraccini,et al.  Non-deterministic stochastic language models for speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Enrique Vidal,et al.  Text and speech translation by means of subsequential transducers , 1996, Nat. Lang. Eng..

[9]  Andrej Ljolje,et al.  A spoken language system for automated call routing , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Alon Lavie,et al.  The Janus-III Translation System: Speech-to-Speech Translation in Multiple Domains , 2004, Machine Translation.

[11]  Rebecca N. Wright,et al.  Finite-State Approximation of Phrase Structure Grammars , 1991, ACL.

[12]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.

[13]  Eduard Hovy,et al.  Machine translation and the information soup : Third Conference of the Association for Machine Translation in the Americas, AMTA'98, Langhorne, PA, USA, October 28-31, 1998 : proceedings , 1998 .

[14]  Srinivas Bangalore,et al.  A Finite-State Approach to Machine Translation , 2001, NAACL.

[15]  Emmanuel Roche,et al.  Finite state transducers: parsing free and frozen sentences , 1999 .

[16]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[17]  Roberto Pieraccini,et al.  Stochastic automata for language modeling , 1996, Comput. Speech Lang..

[18]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[19]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[20]  Allen L. Gorin,et al.  Generating semantically consistent inputs to a dialog manager , 1997, EUROSPEECH.

[21]  Enrique Vidal,et al.  Text speech translation by means of subsequential transducers , 1999 .

[22]  Srinivas Bangalore,et al.  Automatic Acquisition of Hierarchical Transduction Models for Machine Translation , 1998, COLING-ACL.

[23]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[24]  John R. Pierce,et al.  Language and Machines: Computers in Translation and Linguistics , 1966 .

[25]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[26]  Michael Riley,et al.  Speech Recognition by Composition of Weighted Finite Automata , 1996, ArXiv.

[27]  Yves Schabes,et al.  Finite-State Approximation of Phrase-Structure Grammars , 1997 .

[28]  Martin Kay,et al.  Regular Models of Phonological Rule Systems , 1994, CL.

[29]  Srinivas Bangalore,et al.  Complexity of lexical descriptions and its relevance to partial parsing , 1997 .

[30]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[31]  Srinivas Bangalore,et al.  Automatic Acquisition of Phrase Grammars for Stochastic Language Modeling , 1998, VLC@COLING/ACL.

[32]  Klaus Ries,et al.  Improved language modelling by unsupervised acquisition of structure , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[33]  Srinivas Bangalore,et al.  Stochastic finite-state models for spoken language machine translation , 2000 .

[34]  Egidio P. Giachin,et al.  Phrase bigrams for continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[35]  Yaser Al-Onaizan,et al.  Translation with Finite-State Devices , 1998, AMTA.

[36]  Mark-Jan Nederhof,et al.  Practical Experiments with Regular Approximation of Context-Free Languages , 1999, CL.