Linear input network based speaker adaptation in the Dialogos system

Describes an activity devoted to experiment linear input networks (LIN) as a speaker adaptation technique for the neural recognition module of the Dialogos(R) system. The LIN technique is experimented with and some variants devoted to reduce the number of estimated parameters are introduced. The obtained results confirm the validity of LIN for speaker adaptation, while the introduced variants are a valid alternative when a reduced model size is important. The potentialities and drawbacks of supervised and unsupervised speaker adaptation are illustrated. Experimentations with a speaker dependent data base collected from real interactions with the Dialogos system are described in detail showing, in both cases, a relevant improvement in comparison with the speaker independent model.

[1]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[2]  Roberto Gemello,et al.  Continuous speech recognition with neural networks and stationary-transitional acoustic units , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[3]  Morena Danieli,et al.  Dialogos: a robust system for human-machine spoken dialogue on the telephone , 1996, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Ciro Martins,et al.  An incremental speaker-adaptation technique for hybrid HMM-MLP recognizer , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  D. Albesano,et al.  Speeding up neural network execution: an application to speech recognition , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[6]  Ciro Martins,et al.  Speaker-adaptation in a hybrid HMM-MLP recognizer , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Morena Danieli,et al.  A robust system for human-machine dialogue in telephony-based applications , 1997, Int. J. Speech Technol..

[8]  Pietro Laface,et al.  Acoustic-phonetic modeling for flexible vocabulary speech recognition , 1995, EUROSPEECH.

[9]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[10]  Ciro Martins,et al.  Unsupervised Speaker-Adaptation For Hybrid Hmm-Mlp Continuous Speech Recognition System , 1995 .

[11]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.