A segment-based speaker adaptation neural network applied to continuous speech recognition

The authors describe a speaker adaptation technique using segment-based neural-mapping applied to continuous speech recognition. The adaptation neural network has a time-shifted subconnection architecture to maintain the temporal structure in the acoustic segment and to decrease the amount of speech data for training. The effectiveness of this network has been reported for phoneme recognition. The speaker adaptation network is combined with a TDNN-LR continuous speech recognizer, and is evaluated in word and phrase recognition experiments with several speakers. The results of 500-word recognition experiments show that the recognition rate by segment-based adaptation is 92.2%, 28.8% higher than the rate without adaptation. The results of 278 phrase recognition experiments show that the recognition rate by segment-based adaptation is 57.4%, 27.7% higher than the rate without adaptation.<<ETX>>

[1]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[2]  Kiyohiro Shikano,et al.  Integrated training for spotting Japanese phonemes using large phonemic time-delay neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  Y. Komori A neural fuzzy training approach for continuous speech recognition improvement , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  H. Sawai,et al.  Segment-based speaker adaptation by neural network , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[5]  M. Sugiyama,et al.  Speaker-independent phoneme recognition using large-scale neural networks , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.