Application-independent evaluation of speaker detection

We propose and motivate an alternative to the traditional error-based or cost-based evaluation metrics for the goodness of speaker detection performance. The metric that we propose is an information-theoretic one, which measures the effective amount of information that the speaker detector delivers to the user. We show that this metric is appropriate for the evaluation of what we call application-independent detectors, which output soft decisions in the form of log-likelihood-ratios, rather than hard decisions. The proposed metric is constructed via analysis and generalization of cost-based evaluation metrics. This construction forms an interpretation of this metric as an expected cost, or as a total error-rate, over a range of different application-types. We further show how the metric can be decomposed into a discrimination and a calibration component. We conclude with an experimental demonstration of the proposed technique to evaluate three speaker detection systems submitted to the NIST 2004 Speaker Recognition Evaluation.

[1]  H. P. Wynn,et al.  Experimental Design to Maximize Information , 2022 .

[2]  C. R. Smith,et al.  Maximum-Entropy and Bayesian Methods in Inverse Problems , 1985 .

[3]  Herbert Gish,et al.  Improved estimation, evaluation and applications of confidence measures for speech recognition , 1997, EUROSPEECH.

[4]  Michael E. Schuckers Interval Estimates When No Failures Are Observed , 2002 .

[5]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .

[6]  M. Degroot Optimal Statistical Decisions , 1970 .

[7]  Richard E. Blahut,et al.  Principles and practice of information theory , 1987 .

[8]  T. Minka A comparison of numerical optimizers for logistic regression , 2004 .

[9]  Didier Meuwly,et al.  Statistical methods and Bayesian interpretation of evidence in forensic automatic speaker recognition , 2003, INTERSPEECH.

[10]  Beat Pfister,et al.  Estimating the weight of evidence in forensic speaker verification , 2003, INTERSPEECH.

[11]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[12]  Daniel Garcia-Romero,et al.  Robust likelihood ratio estimation in Bayesian forensic speaker recognition , 2003, INTERSPEECH.

[13]  David A. van Leeuwen,et al.  Results of the 2003 NFI-TNO forensic speaker recognition evaluation , 2004, Odyssey.

[14]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[15]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[16]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[17]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[18]  Jean-François Bonastre,et al.  Bayesian bpproach based decision in speaker verification , 2001, Odyssey.

[19]  J. Skilling,et al.  Maximum-entropy and Bayesian methods in inverse problems , 1985 .

[20]  Paola Sebastiani,et al.  Experimental design to maximise information , 2001 .

[21]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[22]  Li Deng,et al.  A Bayesian approach to the verification problem: applications to speaker verification , 2001, IEEE Trans. Speech Audio Process..

[23]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[24]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[25]  Alvin F. Martin,et al.  The NIST 1999 Speaker Recognition Evaluation - An Overview , 2000, Digit. Signal Process..

[26]  R. Baierlein Probability Theory: The Logic of Science , 2004 .

[27]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[28]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[29]  A. H. Murphy,et al.  “Good” Probability Assessors , 1968 .

[30]  Leonard A. Smith,et al.  Evaluating Probabilistic Forecasts Using Information Theory , 2002 .

[31]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[32]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[33]  William M. Campbell,et al.  Estimating and evaluating confidence for forensic speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[34]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[35]  N. Dalkey Inductive Inference and the Maximum Entropy Principle , 1985 .

[36]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[37]  David Lindley,et al.  Statistical Decision Functions , 1951, Nature.

[38]  David A. van Leeuwen,et al.  NIST and NFI-TNO evaluations of automatic speaker recognition , 2006, Comput. Speech Lang..

[39]  Ravindra K. Ahuja,et al.  A Fast Scaling Algorithm for Minimizing Separable Convex Functions Subject to Chain Constraints , 2001, Oper. Res..

[40]  Javier Ortega-Garcia,et al.  On the application of the Bayesian approach in real forensic conditions with GMM-based systems , 2001, Odyssey.

[41]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .