Using word posterior probabilities in lattice translation

In this paper we describe the statistical machine translation system developed at ITI/UPV, which aims especially at speech recognition and statistical machine translation integration, for the evaluation campaign of the Internationa l Workshop on Spoken Language Translation (2007). The system we have developed takes advantage of an improved word lattice representation that uses word posterior probabilities. These word posterior probabilities are the n added as a feature to a log-linear model. This model includes a stochastic finite-state transducer which allows an easy la ttice integration. Furthermore, it provides a statistical p hrasebased reordering model that is able to perform local reorderings of the output. We have tested this model on the Italian-English corpus, for clean text, 1-best ASR and lattice ASR inputs. The results and conclusions of such experiments are reported at the end of this paper.

[1]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[3]  Hirofumi Yamamoto,et al.  A decoding algorithm for word lattice translation in speech translation , 2005, IWSLT.

[4]  Hermann Ney,et al.  Speech translation: coupling of recognition and translation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  S. H. A N K A R K U M A R,et al.  A weighted finite state transducer translation template model for statistical machine translation , 2005, Natural Language Engineering.

[6]  Francisco Casacuberta,et al.  N-BEST REORDERING IN STATISTICAL MACHINE TRANSLATION , 2006 .

[7]  Eiichiro Sumita,et al.  Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World , 2002, LREC.

[8]  Hermann Ney,et al.  On the integration of speech recognition and statistical machine translation , 2005, INTERSPEECH.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[11]  N. Bertoldi,et al.  A new decoder for spoken language translation based on confusion networks , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[12]  Hermann Ney,et al.  Some approaches to statistical and finite-state speech-to-speech translation , 2004, Comput. Speech Lang..

[13]  Brian Roark,et al.  A generalized construction of integrated speech recognition transducers , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[15]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[16]  Mauro Cettolo,et al.  Integrated n-best re-ranking for spoken language translation , 2005, INTERSPEECH.

[17]  Enrique Vidal,et al.  Finite-state speech-to-speech translation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[19]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part II , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  D. Vila Combining statistical and finite-state methods for machine translation , 2005 .

[21]  Francisco Casacuberta,et al.  Machine Translation with Inferred Stochastic Finite-State Transducers , 2004, Computational Linguistics.