Real-time Incremental Speech-to-Speech Translation of Dialogs

In a conventional telephone conversation between two speakers of the same language, the interaction is real-time and the speakers process the information stream incrementally. In this work, we address the problem of incremental speech-to-speech translation (S2S) that enables cross-lingual communication between two remote participants over a telephone. We investigate the problem in a novel real-time Session Initiation Protocol (SIP) based S2S framework. The speech translation is performed incrementally based on generation of partial hypotheses from speech recognition. We describe the statistical models comprising the S2S system and the SIP architecture for enabling real-time two-way cross-lingual dialog. We present dialog experiments performed in this framework and study the tradeoff in accuracy versus latency in incremental speech translation. Experimental results demonstrate that high quality translations can be generated with the incremental approach with approximately half the latency associated with non-incremental approach.

[1]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jason Baldridge,et al.  Verbmobil: Foundations of Speech-to-Speech Translation, by Wolfgang Wahlster (editor). Springer. 2000. ISBN 3-540-67783-6. Price £44.50 (hardback). xii+679 pages , 2004, Natural Language Engineering.

[3]  Dilek Z. Hakkani-Tür,et al.  The AT&T WATSON speech recognizer , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[6]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[7]  T. V. Raman,et al.  AxsJAX: a talking translation bot using google IM: bringing web-2.0 applications to life , 2008, W4A '08.

[8]  Tanja Schultz,et al.  Speechalator: two-way speech-to-speech translation on a consumer PDA , 2003, INTERSPEECH.

[9]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Michael Picheny,et al.  A hand-held speech-to-speech translation system , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Hitoshi Iida,et al.  Incremental Translation Utilizing Constituent Boundary Patterns , 1996, COLING.

[14]  Dilek Z. Hakkani-Tür,et al.  Improving speech translation with automatic boundary prediction , 2007, INTERSPEECH.

[15]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[16]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[17]  Anoop Sarkar,et al.  Incremental Decoding for Phrase-Based Statistical Machine Translation , 2010, WMT@ACL.

[18]  Srinivas Bangalore,et al.  A Scalable Approach to Building a Parallel Corpus from the Web , 2011, INTERSPEECH.

[19]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[20]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[21]  Michael Paul,et al.  Overview of the IWSLT06 evaluation campaign , 2006, IWSLT.

[22]  Alex Waibel,et al.  JANUS: a speech-to-speech translation system using connectionist and symbolic processing strategies , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[23]  Fabio Pianesi,et al.  The NESPOLE! Speech-to-Speech Translation System , 2002, AMTA.