Confidence and reliability measures in speaker verification

Speaker verification is a biometric identity verification technique whose performance can be severely degraded by the presence of noise. Using a coherent notation, we reformulate and review several methods which have been proposed to quantify the uncertainty in verification results, some with a view to coping with the effects of mismatched training-testing environments. We also include a recently proposed method, which is firmly rooted in a probabilistic approach and interpretation, and explicitly measures signal quality before assigning a reliability value to the speaker verification classifier's decision. We evaluate the performance of the confidence and reliability measures over a noisy 251-users database, showing that taking into account signal-domain quality can lead to better accuracy in prediction of classifier errors. We discuss possible strategies for using the measures in a speaker verification system, balancing acquisition duration and verification error rate.

[1]  Jean-François Bonastre,et al.  ALIZE, a free toolkit for speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[3]  Lou Boves,et al.  Additive background noise as a source of non-linear mismatch in the cepstral and log-energy domain , 2005, Comput. Speech Lang..

[4]  Rosa Maria Valdovinos,et al.  The Imbalanced Training Sample Problem: Under or over Sampling? , 2004, SSPR/SPR.

[5]  H. Gish,et al.  Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.

[6]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[7]  Hui Jiang,et al.  Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[8]  Michael Perrone,et al.  Confidence modeling for verification post-processing for handwriting recognition , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[9]  Richard M. Dansereau,et al.  Robust joint audio-video localization in video conferencing using reliability information , 2004, IEEE Transactions on Instrumentation and Measurement.

[10]  Eric Horvitz,et al.  Bayesian Modality Fusion: Probabilistic Integration of Multiple Vision Algorithms for Head Tracking , 1999 .

[11]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  Didier Meuwly,et al.  Statistical methods and Bayesian interpretation of evidence in forensic automatic speaker recognition , 2003, INTERSPEECH.

[13]  Alexander H. Waibel,et al.  The effects of room acoustics on MFCC speech parameter , 2000, INTERSPEECH.

[14]  Biing-Hwang Juang,et al.  Automatic verbal information verification for user authentication , 2000, IEEE Trans. Speech Audio Process..

[15]  A. Murat Tekalp,et al.  Multimodal speaker identification using an adaptive classifier cascade based on modality reliability , 2005, IEEE Transactions on Multimedia.

[16]  John S. D. Mason,et al.  On the limitations of cepstral features in noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Javier Ortega-Garcia,et al.  Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition , 2006, Comput. Speech Lang..

[18]  Erhan MENGUSOGLU CONFIDENCE MEASURE BASED MODEL ADAPTATION FOR SPEAKER VERIFICATION , 2003 .

[19]  William M. Campbell,et al.  Estimating and evaluating confidence for forensic speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[20]  Jiebo Luo,et al.  Automatic image orientation detection via confidence-based integration of low-level and semantic cues , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Jonas Richiardi,et al.  A probabilistic measure of modality reliability in speaker verification , 2005 .

[22]  Xiaoqing Ding,et al.  Rejection algorithm for mis-segmented characters in multilingual document recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[23]  Hirotaka Nakasone,et al.  Forensic automatic speaker recognition , 2001, Odyssey.

[24]  Mark C. Huggins,et al.  Confidence metrics for speaker identification , 2002, INTERSPEECH.

[25]  Jonas Richiardi,et al.  Speaker Verification with Confidence and Reliability Measures , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[26]  Sabri Gurbuz,et al.  Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus , 2002, EURASIP J. Adv. Signal Process..

[27]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[28]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[29]  L.W.J. Boves,et al.  On decision making in forensic casework , 1999 .

[30]  尚弘 島影 National Institute of Standards and Technologyにおける超伝導研究及び生活 , 2001 .

[31]  Samy Bengio,et al.  Improving Fusion with Margin-Derived Confidence in Biometric Authentication Tasks , 2005, AVBPA.

[32]  Lambert Schomaker Proceedings of the 2nd International Workshop on Frontiers in Handwriting Recognition (IWFHR-2). , 1991 .

[33]  Andrzej Drygajlo,et al.  Entropy based voice activity detection in very noisy conditions , 2001, INTERSPEECH.

[34]  Samy Bengio,et al.  Confidence measures for multimodal identity verification , 2002, Inf. Fusion.

[35]  Barbara Zitov a Journal of the Franklin Institute , 1942, Nature.