Towards better making a decision in speaker verification

Speaker verification is a process that accepts or rejects the identity claim of a speaker. How to make a decision is a critical problem; a threshold for decision-making critically determines performance of a speaker verification system. Traditional threshold estimation methods take only information conveyed by training data into consideration and, to a great extent, do not relate it to production data. It turns out that a speaker verification system with such threshold estimation suffers from poor performance in reality due to mismatches. In this paper, we propose several methods towards better decision-making in a practical speaker verification system. Our methods include the use of additional reliable statistical information for threshold estimation, elimination of abnormal data for better estimation of underlying statistics, and on-line incremental threshold update. To evaluate the performance of our methods, we have done simulations based on a baseline system, Gaussian Mixture Model, in both text-dependent and text-independent modes. Comparative results show that in contrast to the recent threshold estimation methods our methods yield considerably better performance, especially on miscellaneous mismatch conditions, in terms of generalization. Thus our methods provide a promising way for real speaker verification applications.

[1]  Aaron E. Rosenberg,et al.  Speaker background models for connected digit password speaker verification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Frédéric Bimbot,et al.  Speaker verification in the telephone network: research activities in the cave project , 1997, EUROSPEECH.

[3]  Aaron E. Rosenberg,et al.  Report: A vector quantization approach to speaker recognition , 1987, AT&T Technical Journal.

[4]  Robert I. Damper,et al.  Impostor cohort selection for score normalisation in speaker verification , 1997, Pattern Recognit. Lett..

[5]  Dominique Genoud,et al.  A comparison of a priori threshold setting procedures for speaker verification in the CAVE project , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Mark E. Oxley,et al.  Cohort selection and word grammar effects for speaker recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Chin-Hui Lee,et al.  Speaker verification using normalized log-likelihood score , 1996, IEEE Trans. Speech Audio Process..

[8]  Douglas A. Reynolds,et al.  The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective , 2000, Speech Commun..

[9]  Aaron E. Rosenberg,et al.  Sub-word unit talker verification using hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Gérard Chollet,et al.  On the Use of Prior Knowledge in Normalization Schemes for Speaker Verification , 2000, Digit. Signal Process..

[11]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[12]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[13]  Sadaoki Furui,et al.  Likelihood normalization for speaker verification using a phoneme- and speaker-independent model , 1995, Speech Commun..

[14]  Dominique Genoud,et al.  An overview of the CAVE project research activities in speaker verification , 2000, Speech Commun..

[15]  Chafic Mokbel,et al.  Behavior of a Bayesian adaptation method for incremental enrollment in speaker verification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  M A Lund,et al.  A robust sequential test for text-independent speaker verification. , 1996, The Journal of the Acoustical Society of America.

[17]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[18]  Edward J. Wegman,et al.  Statistical Signal Processing , 1985 .

[19]  Naftali Z. Tisby On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[20]  David Zhang,et al.  A novel text-independent speaker verification method based on the global speaker model , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[24]  Michael J. Carey,et al.  A speaker verification system using alpha-nets , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[25]  A.E. Rosenberg,et al.  Automatic speaker verification: A review , 1976, Proceedings of the IEEE.

[26]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[27]  Frédéric Bimbot,et al.  Techniques for a priori decision threshold estimation in speaker verification , 1998 .

[28]  Ke Chen,et al.  ISIS: A multilingual spoken dialog system developed with CORBA and KQML agents , 2000, INTERSPEECH.

[29]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..

[30]  Sadaoki Furui,et al.  Robust methods of updating model and a priori threshold in speaker verification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[31]  J.M. Naik,et al.  Speaker verification: a tutorial , 1990, IEEE Communications Magazine.

[32]  M. Degroot Optimal Statistical Decisions , 1970 .

[33]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[34]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..