Hybrid Connectionist and Classical Approaches in JANUS , an Advanced Speech-to-Speech Translation System

| In this paper we report on our eeorts to combine speech and language processing toward multilingual spontaneous speech translation. The ongoing work extends our JANUS system eeort toward handling spontaneous spoken discourse and multiple languages. After an overview of the task, databases, and the system architecture we will focus on how connectionist modules are integrated in the overall system design. We will show that these modules can because of their learning capabilities adapt themselves to the problem space. Moreover, because of their inherent robustness against noise they seem to be an adequate tool for analyzing spontaneous speech.

[1]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[2]  Parsing Spontaneous Speech: A Hybrid Approach , 1994 .

[3]  Alexander H. Waibel,et al.  Tuning by doing: flexibility through automatic structure optimization , 1993, EUROSPEECH.

[4]  Carolyn Penstein Rosé,et al.  Speech--Language Integration In A Multi--Lingual Speech Translation System , 1994, AAAI 1994.

[5]  Waibel A novel objective function for improved phoneme recognition using time delay neural networks , 1989 .

[6]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  M. Baltin,et al.  The Mental representation of grammatical relations , 1985 .

[8]  Alexander H. Waibel,et al.  Learning complex output representations in connectionist parsing of spoken language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  A. Waibel,et al.  Multi-speaker/speaker-independent architectures for the multi-state time delay neural network , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Ulrich Bodenhausen,et al.  Connectionist architectural learning for high performance character and speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Tanja Schultz,et al.  Acoustic and language modeling of human and nonhuman noises for human-to-human spontaneous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Sharon L. Oviatt,et al.  Predicting and Managing Spoken Disfluencies During Human-Computer Interaction , 1994, HLT.

[13]  Hermann Hild,et al.  Language models for a spelled letter recognizer , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Alexander H. Waibel,et al.  Learning state-dependent stream weights for multi-codebook HMM speech recognition systems , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Alexander H. Waibel,et al.  Speaker-independent connected letter recognition with a multi-state time delay neural network , 1992, EUROSPEECH.

[16]  Ajay N. Jain,et al.  Parsing Complex Sentences with Structured Connectionist Networks , 1991, Neural Computation.

[17]  Ivan A. Sag,et al.  Information-Based Syntax and Semantics: Volume 1, Fundamentals , 1987 .

[18]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..