Potential scope of a fully-integrated architecture for speech translation

The classical approach to tackle speech translation assembles a text-to-text translation system placed after a speech recogniser, yielding the so-called decoupled architecture. In this regard, there are two issues to bear in mind: first, what is translated in the decoupled architecture is the most likely transcription of the spoken utterance; second, translation systems are sensitive to errors in the source string, and speech recognition systems are still far from being flawless. In this paper we promote the use of an architecture to carry out speech translation that allows to build up the most likely translation relying upon both acoustic and translation models in a cooperative manner, that is the so-called integrated architecture. The integrated architecture is implemented in the finite-state framework by virtue of the composition of finite-state acoustic models of the source language within a stochastic finite-state transducer that would encompass source and target languages. The potential performance of the integrated architecture is assessed quantitatively in relation to the decoupled one. We conclude that while the single-best approach for both decoupled and integrated architectures show similar performance, an oracle evaluation reveals that the potential scope of the integrated architecture would offer statistically significant differences. c © 2010 European Association for Machine Translation. 1 Statistical speech translation The goal of statistical speech translation is to seek the most likely string in the target language, t̂, given the acoustic representation of a speech signal in the source language, x.

[1]  M. Inés Torres,et al.  Joining linguistic and statistical methods for Spanish-to-Basque speech translation , 2008, Speech Commun..

[2]  Hermann Ney,et al.  Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  William J. Byrne,et al.  Statistical Phrase-Based Speech Translation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  George Saon,et al.  Lattice-based Viterbi decoding techniques for speech translation , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  F. Casacuberta,et al.  Recent efforts in spoken language translation , 2008, IEEE Signal Processing Magazine.

[6]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[7]  Hermann Ney,et al.  ASR Word Lattice Translation with Exhaustive Reordering is Possible , 2008 .

[8]  Isabel Trancoso,et al.  A specialized on-the-fly algorithm for lexicon and language model composition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Sanjeev Khudanpur,et al.  Efficient Extraction of Oracle-best Translations from Hypergraphs , 2009, HLT-NAACL.

[10]  N. Bertoldi,et al.  A new decoder for spoken language translation based on confusion networks , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[11]  Mauro Cettolo,et al.  Integrated n-best re-ranking for spoken language translation , 2005, INTERSPEECH.

[12]  M. Inés Torres,et al.  Speech Translation with Phrase Based Stochastic Finite-State Transducers , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Hermann Ney,et al.  Complexity of Finding the BLEU-optimal Hypothesis in a Confusion Network , 2008, EMNLP.

[14]  Enrique Vidal,et al.  Finite-state speech-to-speech translation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  José B. Mariño,et al.  System Combination for Machine Translation of Spoken and Written Language , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Ying Zhang,et al.  Measuring confidence intervals for the machine translation evaluation metrics , 2004, TMI.

[17]  Alexander H. Waibel,et al.  Stream decoding for simultaneous spoken language translation , 2008, INTERSPEECH.

[18]  Hermann Ney,et al.  Word Graphs for Statistical Machine Translation , 2005, ParallelText@ACL.

[19]  Richard Zens,et al.  Speech Translation by Confusion Network Decoding , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[20]  Hermann Ney,et al.  Are Very Large N-Best Lists Useful for SMT? , 2007, HLT-NAACL.

[21]  Hermann Ney,et al.  Speech translation: coupling of recognition and translation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).