Fast pole filtering for speaker recognition

Mismatched training and testing conditions for speaker recognition exist when speech is subjected to a different channel for both cases. This results in diminished speaker recognition performance. Estimating and removing the channel filtering effect will make speaker recognition systems more robust. It has been shown that a reliable estimate is obtained by taking the mean of the pole filtered linear predictive (LP) cepstrum. Finding the pole filtered mean requires factorization of the LP polynomial which is computationally intensive especially for real time applications. In this paper, we examine a fast method of doing pole filtering that avoids polynomial factorization. This method is much more computationally efficient and gives equal or better performance than the conventional way of doing pole filtering. Experimental results are given for four databases having a variety of mismatched conditions.

[1]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Richard J. Mammone,et al.  An analysis of data fusion methods for speaker verification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Manish Sharma,et al.  "Blind" speech segmentation: automatic segmentation of speech without linguistic knowledge , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  A.E. Rosenberg,et al.  Automatic speaker verification: A review , 1976, Proceedings of the IEEE.

[6]  G.R. Doddington,et al.  Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[7]  Devang Naik,et al.  Pole-filtered cepstral mean subtraction , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Richard J. Mammone,et al.  Growing and Pruning Neural Tree Networks , 1993, IEEE Trans. Computers.

[9]  Richard J. Mammone,et al.  Channel normalization using pole-filtered cepstral mean subtraction , 1994, Optics & Photonics.

[10]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[11]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..