Neural tree network/vector quantization probability estimators for speaker recognition

A new classification system for text-independent speaker recognition is presented. This system combines the output probabilities of distortion-based classifiers and a discriminant-based classifier. The distortion-based classifiers are the vector quantization (VQ) classifier and Gaussian mixture model (GMM). The discriminant-based classifier is the neural tree network (NTN). The VQ and GMM classifiers provide output probabilities that represent the distortion between the observation and the model. Hence, these probabilities provide an intraclass measure. The NTN classifier is based on discriminant training and provides output probabilities that represent an interclass measure. Since, these two classifiers base their decision on different criteria, they can be effectively combined to yield improved performance. Two combining methods are evaluated for several speaker recognition tasks, including speaker verification and closed set speaker identification. The results show the both methods to yield advantages for the speaker recognition tasks.<<ETX>>

[1]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[2]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[3]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[4]  Richard J. Mammone,et al.  Growing and Pruning Neural Tree Networks , 1993, IEEE Trans. Computers.

[5]  Aaron E. Rosenberg,et al.  Sub-word unit talker verification using hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  Amar Mitiche,et al.  Multisensor Knowledge Systems , 1988, Int. J. Robotics Res..

[7]  J. Oglesby,et al.  Radial basis function networks for speaker recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Aaron E. Rosenberg,et al.  Evaluation of a vector quantization talker recognition system in text independent and text dependent modes , 1987 .

[9]  James J. Clark,et al.  Data Fusion for Sensory Information Processing Systems , 1990 .

[10]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.