Experimental evaluation of features for robust speaker identification

This correspondence presents an experimental evaluation of different features and channel compensation techniques for robust speaker identification. The goal is to keep all processing and classification steps constant and to vary only the features and compensations used to allow a controlled comparison. A general, maximum-likelihood classifier based on Gaussian mixture densities is used as the classifier, and experiments are conducted on the King speech database, a conversational, telephone-speech database. The features examined are mel-frequency and linear-frequency filterbank cepstral coefficients, linear prediction cepstral coefficients, and perceptual linear prediction (PLP) cepstral coefficients. The channel compensation techniques examined are cepstral mean removal, RASTA processing, and a quadratic trend removal technique. It is shown for this database that performance differences between the basic features is small, and the major gains are due to the channel compensation techniques. The best "across-the-divide" recognition accuracy of 92% is obtained for both high-order LPC features and band-limited filterbank features. >

[1]  Douglas A. Reynolds,et al.  Text independent speaker identification using automatic acoustic segmentation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[3]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[4]  Yu-Hung Kao Robustness study of free-text speaker identification and verification , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[6]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[8]  Lawrence G. Bahler,et al.  Voice identification using nearest-neighbor distance measure , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  H. Gish Robust discrimination in automatic speaker identification , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  G. Velius,et al.  Variants of cepstrum based speaker identity verification , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[11]  Frank K. Soong,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[12]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.