An experimental Japanese/English interpreting video phone system

We report on the architectural design issues and experiences gained while building and demonstrating an experimental interpreting video phone (IVP) system. The IVP system has been demonstrated in an Internet home shopping simulation simultaneously before live audiences in Japan and the US. An American shop assistant and a Japanese customer engaged in task directed dialogues using their native languages. In addition to their direct audio/visual contact by ISDN video phone, each participant heard a translation of the remote speaker's utterances in a synthetic voice in real time. Each site used a medium size vocabulary, a continuous speech recognition system and a text to speech synthesis (TTS) system for the local language. Recognition results were transmitted over the Internet to the remote site, where the corresponding translated sentence was spoken by TTS in the listener's native language. All of the speech and language processing software components of the system were independently developed proprietary technologies of the authors' laboratories which were integrated using commercially available hardware and communication media. Difficulties encountered in developing the system, the accommodations which were made, and other experiences gained through the process are reported.