Developing Client-Server Speech Translation Platform

This paper describes a client-server speech translation platform designed for use at mobile terminals. Because terminals and servers are connected via a 3G public mobile phone networks, speech translation services are available at various places with thin client. This platform realizes hands-free communication and robustness for real use of speech translation in noisy environments. A microphone array and new noise suppression technique improves speech recognition performance, and a corpus-based approach enables wide coverage, robustness and portability to new languages and domains. The experimental result for evaluating the communicability of speakers of different languages shows that task completion rates using the speech translation system of 85% and 75% are achieved for Japanese- English and Japanese-Chinese, respectively. The system also has the ability to convey approximately one item of information per 2 utterances (one turn) on average for both Japanese-English and Japanese-Chinese in a task-oriented dialogue.

[1]  Tomoki Toda,et al.  Optimizing sub-cost functions for segment selection based on perceptual evaluations in concatenative speech synthesis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Kenji Imamura,et al.  Application of translation knowledge acquired by hierarchical phrase alignment for pattern-based MT. , 2002, TMI.

[3]  Panos K. Chrysanthis,et al.  Proceedings of the 6th international conference on Mobile data management , 2003 .

[4]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Frank K. Soong,et al.  Generalized posterior probability for minimum error verification of recognized sentences , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Taro Watanabe,et al.  Example-based Decoding for Statistical Machine Translation , 2003 .

[7]  Toshiyuki Takezawa,et al.  Collecting machine-translation-aided bilingual dialogues for corpus-based speech translation , 2003, INTERSPEECH.

[8]  Taro Watanabe,et al.  Using Language and Translation Models to Select the Best among Outputs from Multiple MT Systems , 2002, COLING.

[9]  T. Horiuchi,et al.  Hands-free speech recognition and communication on PDAs using microphone array technology , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[10]  Satoshi Nakamura,et al.  Automatic generation of non-uniform context-dependent HMM topologies based on the MDL criterion , 2003, INTERSPEECH.

[11]  Toshiyuki Takezawa,et al.  A Comparative Study on Human Communication Behaviors and Linguistic Characteristics for Speech-to-Speech Translation , 2004, LREC.

[12]  Satoshi Nakamura,et al.  Optimal acoustic and language model weights for minimizing word verification errors , 2004, INTERSPEECH.

[13]  Eiichiro Sumita,et al.  Creating corpora for speech-to-speech translation , 2003, INTERSPEECH.

[14]  Keiichi Tokuda,et al.  XIMERA: a new TTS from ATR based on corpus-based technologies , 2004, SSW.

[15]  Shuntaro Isogai,et al.  Multi-class composite N-gram language model , 2003, Speech Commun..

[16]  Satoshi Nakamura,et al.  The ATR Multilingual Speech-to-Speech Translation System , 2006, IEEE Transactions on Audio, Speech, and Language Processing.