The Recognition Component of the SUNDIAL Project

The recognition component of the SUNDIAL system has being developed jointly by Logica Cambridge, Erlangen University, Cselt, Daimler Bern Ulm, Cap-Gemini Innovation and Politecnico di Torino; the device acts as an Acoustic Front-End, performing the feature extraction and the acoustic-phonetic decoding stages. For the feature extraction stage, several speech processing algorithms were tested and compared by means of RSA (Recognition Sensitivity Analysis) [13], in terms both of recognition performance and speaker-sensitivity. The recogniser was intended to be used over the telephone network: therefore, the problem of high dynamic range and of spectral drifts of the signal were addressed. To this purpose, energy and cepstrum normalisation procedures were introduced to improve robustness in real TLC environments. The acoustic-phonetic stage was based on the HMM technology with Discrete (Italian), Continuous (All languages) and Semi-Continuous (German) paradigms. Speech units were selected on a phoneme basis with context dependency. Tests on the SUNDIAL recogniser have been performed both with read and spontaneous speech. Recognition performance for a typical continuous speech, speaker independent task, based on read input, scored about 80% Word Accuracy over a 1000 words vocabulary, without linguistic constraints. The recognition component has been embedded in real-time prototypes built on the overall SUNDIAL architecture. This also includes linguistic processing, dialogue managing, access to the information system, message generation and text-to-speech synthesis functions. Four prototypes are also being tested with spontaneous dialogues obtained from naive speakers, in two different application environments: access to flight enquiry and reservation in English and French, and train information access in German and Italian.

[1]  Heinrich Niemann,et al.  A non-metrical space search algorithm for fast Gaussian vector quantization , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jeremy Peckham Speech understanding and dialouge over the telephone: an overview of the ESPRIT SUNDIAL , 1991 .

[3]  Morena Danieli,et al.  Managing dialogue in a continuous speech understanding system , 1993, EUROSPEECH.

[4]  Trevor Thomas,et al.  A determination of the sensitivity of speech recognisers to speaker variability , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[5]  Jeremy Peckham,et al.  Speech understanding and dialogue over the telephone: an overview of progress in the sundial project , 1991, EUROSPEECH.

[6]  Pietro Laface,et al.  Analysis and improvement of the partial distance search algorithm , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Scott McGlashan,et al.  Dialogue Semantics for an Oral Dialogue System , 1992, ICSLP.

[8]  S. Rieck,et al.  Acoustic modelling of subword units in the Isadora speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Paolo Baggia,et al.  Partial parsing as a robust parsing strategy , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Pietro Laface,et al.  Selection of speech units for a speaker-independent CSR task , 1991, EUROSPEECH.

[11]  Gerhard Th. Niedermair,et al.  Linguistic modelling in the context of oral dialogue , 1992, ICSLP.

[12]  Alberto Ciaramella,et al.  Real-time speaker-independent large-vocabulary CDHMM-based continuous telephonic speech recognizer , 1992, ICSLP.

[13]  Pietro Laface,et al.  Channel adaptation for a continuous speech recognizer , 1992, ICSLP.

[14]  Peter Regel-Brietzmann,et al.  Fast speaker adaptation combined with soft vector quantization in an HMM speech recognition system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.