BBN TransTalk: Robust multilingual two-way speech-to-speech translation for mobile platforms

In this paper we present a speech-to-speech (S2S) translation system called the BBN TransTalk that enables two-way communication between speakers of English and speakers who do not understand or speak English. The BBN TransTalk has been configured for several languages including Iraqi Arabic, Pashto, Dari, Farsi, Malay, Indonesian, and Levantine Arabic. We describe the key components of our system: automatic speech recognition (ASR), machine translation (MT), text-to-speech (TTS), dialog manager, and the user interface (UI). In addition, we present novel techniques for overcoming specific challenges in developing high-performing S2S systems. For ASR, we present techniques for dealing with lack of pronunciation and linguistic resources and effective modeling of ambiguity in pronunciations of words in these languages. For MT, we describe techniques for dealing with data sparsity as well as modeling context. We also present and compare different user confirmation techniques for detecting errors that can cause the dialog to drift or stall.

[1]  J. Xu,et al.  Audio Indexing of Arabic broadcast news , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Kristin Precoda,et al.  Recent advances in SRI'S IraqComm™ Iraqi Arabic-English speech-to-speech translation system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[4]  Bowen Zhou,et al.  IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-Speech Translator , 2006 .

[5]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[6]  Rohit Prasad,et al.  The BBN 2007 displayless English/iraqi speech-to-speech translation system , 2007, INTERSPEECH.

[7]  Richard M. Schwartz,et al.  Efficient 2-pass n-best decoder , 1997, EUROSPEECH.

[8]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[9]  Daniel Marcu,et al.  Building an English-iraqi Arabic machine translation system for spoken utterances with limited resources , 2006, INTERSPEECH.

[10]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[11]  Stavros Tsakalidis,et al.  Pashto speech recognition with limited pronunciation lexicon , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[13]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[14]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[15]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[16]  Rohit Prasad,et al.  Name aware speech-to-speech translation for English/Iraqi , 2008, 2008 IEEE Spoken Language Technology Workshop.

[17]  Samy Bengio,et al.  Joint decoding for phoneme-grapheme continuous speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Daniel Marcu,et al.  Transonics: A Practical Speech-to-Speech Translator for English-Farsi Medical Dialogs , 2005, ACL.

[19]  Bing Xiang,et al.  Morphological Decomposition for Arabic Broadcast News Transcription , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  Tanja Schultz,et al.  Speechalator: two-way speech-to-speech translation on a consumer PDA , 2003, INTERSPEECH.

[21]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .