Rapid connectionist speaker adaptation

SVCnet, a system for modeling speaker variability, is presented. Encoder neural networks specialized for each speech sound produce low-dimensionality models of acoustical variation, and these models are further combined into an overall model of voice variability. A training procedure is described which minimizes the dependence of this model on which sounds have been uttered. Using the trained model (SVCnet) and a brief, unconstrained sample of a new speaker's voice, the system produces a speaker voice code that can be used to adapt a recognition system to the new speaker without retraining. A system which combines SVCnet with a MS-TDNN recognizer is described.<<ETX>>

[1]  J. Mullennix,et al.  Some effects of talker variability on spoken word recognition. , 1989, The Journal of the Acoustical Society of America.

[2]  Stephen Cox,et al.  Unsupervised speaker adaptation by probabilistic spectrum fitting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Raymond L. Watrous Speaker normalization and adaptation using second-order connectionist networks , 1993, IEEE Trans. Neural Networks.

[4]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.