论文信息 - Rapid connectionist speaker adaptation

Rapid connectionist speaker adaptation

SVCnet, a system for modeling speaker variability, is presented. Encoder neural networks specialized for each speech sound produce low-dimensionality models of acoustical variation, and these models are further combined into an overall model of voice variability. A training procedure is described which minimizes the dependence of this model on which sounds have been uttered. Using the trained model (SVCnet) and a brief, unconstrained sample of a new speaker's voice, the system produces a speaker voice code that can be used to adapt a recognition system to the new speaker without retraining. A system which combines SVCnet with a MS-TDNN recognizer is described.<<ETX>>

Michael Witbrock | P. Haffner | P. Haffner | M. Witbrock

[1] J. Mullennix,et al. Some effects of talker variability on spoken word recognition. , 1989, The Journal of the Acoustical Society of America.

[2] Stephen Cox,et al. Unsupervised speaker adaptation by probabilistic spectrum fitting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3] Raymond L. Watrous. Speaker normalization and adaptation using second-order connectionist networks , 1993, IEEE Trans. Neural Networks.

[4] Alex Waibel,et al. Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.