Speaker-independent phoneme recognition using large-scale neural networks

The authors describe a large-scale neural network architecture based on TDNN (time-delay neural networks) for speaker-independent phoneme recognition which represents an advance over speaker-dependent and multi-speaker phoneme recognition. Based on a preliminary study on speaker-independent phoneme recognition for voiced stops mod b,d,g mod , a large-scale network is constructed with about 330000 connections in a modular fashion. For speaker-independent all-consonant recognition, a multi-speaker training approach is implemented with several devices in the process of training. This network finally achieved favorable results for speaker-independent phoneme recognition.<<ETX>>

[1]  Hidefumi Sawai,et al.  Time-delay neural network architectures for high-performance speaker-independent recognition , 1991, EUROSPEECH.

[2]  H. Sawai,et al.  Segment-based speaker adaptation by neural network , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[3]  Shigeru Katagiri,et al.  Construction of a large-scale Japanese speech database and its management system , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  Alex Waibel,et al.  Consonant recognition by modular construction of large phonemic time-delay neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  Alex Waibel,et al.  The Meta-Pi network: connectionist rapid adaptation for high-performance multi-speaker phoneme recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  Victor W. Zue,et al.  Phonetic classification using multi-layer perceptrons , 1990, International Conference on Acoustics, Speech, and Signal Processing.