Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions

A common problem in speaker identification systems is that a mismatch in the training and testing conditions sacrifices much performance. We attempt to alleviate this problem by proposing new features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise. The conventional feature used is the linear predictive (LP) cepstrum that is derived from an all-pole transfer function which, in turn, achieves a good approximation to the spectral envelope of the speech. A different cepstral feature based on a pole-zero function (called the adaptive component weighted or ACW cepstrum) was previously introduced. We propose four additional new cepstral features based on pole-zero transfer functions. One is an alternative way of doing adaptive component weighting and is called the ACW2 cepstrum. Two others (known as the PFL1 cepstrum and the PFL2 cepstrum) are based on a pole-zero postfilter used in speech enhancement. Finally, an autoregressive moving-average (ARMA) analysis of speech results in a pole-zero transfer function describing the spectral envelope. The cepstrum of this transfer function is the feature. Experiments involving a closed set, text-independent and vector quantizer based speaker identification system are done to compare the various features. The TIMIT and King databases are used. The ACW and PFL1 features are the preferred features, since they do as well or better than the LP cepstrum for all the test conditions. The corresponding spectra show a clear emphasis of the formants and no spectral tilt.

[1]  Yu-Hung Kao Robustness study of free-text speaker identification and verification , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[3]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[4]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[5]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[6]  Yu-Hung Kao,et al.  Speaker Recognition Over Telephone Channels , 1995 .

[7]  G.R. Doddington,et al.  Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[8]  Devang Naik,et al.  Pole-filtered cepstral mean subtraction , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Richard J. Mammone,et al.  New LP-derived features for speaker identification , 1994, IEEE Trans. Speech Audio Process..

[10]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Yoh'ichi Tohkura,et al.  A weighted cepstral distance measure for speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[12]  Aaron E. Rosenberg,et al.  Evaluation of a vector quantization talker recognition system in text independent and text dependent modes , 1987 .

[13]  Richard J. Mammone,et al.  A fast algorithm for finding the adaptive component weighted cepstrum for speaker recognition , 1997, IEEE Trans. Speech Audio Process..

[14]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[15]  Charles W. Therrien,et al.  Discrete Random Signals and Statistical Signal Processing , 1992 .

[16]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[17]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[18]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[19]  Kuldip K. Paliwal,et al.  On the performance of the quefrency-weighted cepstral coefficients in vowel recognition , 1982, Speech Commun..

[20]  Man Mohan Sondhi,et al.  Enhancement of ADPCM speech coding with backward-adaptive algorithms for postfiltering and noise feedback , 1988, IEEE J. Sel. Areas Commun..