Text-independent speaker verification using utterance level scoring and covariance modeling

This paper describes a computationally simple method to perform text independent speaker verification using second order statistics. The suggested method, called utterance level scoring (ULS), allows one to obtain a normalized score using a single pass through the frames of the tested utterance. The utterance sample covariance is first calculated and then compared to the speaker covariance using a distortion measure. Subsequently, a distortion measure between the utterance covariance and the sample covariance of data taken from different speakers is used to normalize the score. Experimental results from the 2000 NIST speaker recognition evaluation are presented for ULS, used with different distortion measures, and for a Gaussian mixture model (GMM) system. The results indicate that ULS as a viable alternative to GMM whenever the computational complexity and verification accuracy needs to be traded.

[1]  R.D. Zilca Text-independent speaker verification using covariance modeling , 2001, IEEE Signal Processing Letters.

[2]  Stéphane H. Maes,et al.  A distance measure between collections of distributions and its application to speaker recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Ivan Magrin-Chagnolleau,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..

[4]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[5]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[6]  Douglas A. Reynolds,et al.  Comparison of background normalization methods for text-independent speaker verification , 1997, EUROSPEECH.

[7]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[8]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[9]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[10]  Marcos Faúndez-Zanuy,et al.  Speaker identification in mismatch training and testing conditions , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11]  Yuval Bistritz,et al.  Distance-based Gaussian mixture model for speaker recognition over the telephone , 2000, INTERSPEECH.

[12]  H. Gish Robust discrimination in automatic speaker identification , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[13]  Douglas A. Reynolds,et al.  HTIMIT and LLHDB: speech corpora for the study of handset transducer effects , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..