Uncertainties of measures in speaker recognition evaluation

The National Institute of Standards and Technology (NIST) Speaker Recognition Evaluations (SRE) are an ongoing series of projects conducted by NIST. In the NIST SRE, speaker detection performance is measured using a detection cost function, which is defined as a weighted sum of probabilities of type I error and type II error. The sampling variability can result in measurement uncertainties of the detection cost function. Hence, while evaluating and comparing the performances of speaker recognition systems, the uncertainties of measures must be taken into account. In this article, the uncertainties of detection cost functions in terms of standard errors (SE) and confidence intervals are computed using the nonparametric two-sample bootstrap methods based on our extensive bootstrap variability studies on large datasets conducted before. The data independence is assumed because the bootstrap results of SEs matched very well with the analytical results of SEs using the Mann-Whitney statistic for independent and identically distributed samples if the metric of area under a receiver operating characteristic curve is employed. Examples are provided.

[1]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[2]  Robert Schrek,et al.  Statistics in Research. Basic Concepts and Techniques for Research Workers , 1955 .

[3]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[4]  Alvin F. Martin,et al.  NIST speaker recognition evaluations 1996-2008 , 2009, Defense + Commercial Sensing.

[5]  Charles L. Wilson,et al.  Nonparametric analysis of fingerprint data on large data sets , 2007, Pattern Recognit..

[6]  Raghu N. Kacker,et al.  Validation of Two-Sample Bootstrap in ROC Analysis on Large Datasets Using AURC | NIST , 2010 .

[7]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[8]  Michael D. Garris,et al.  Nonparametric statistical data analysis of fingerprint minutiae exchange with two-finger fusion , 2006, SPIE Defense + Commercial Sensing.

[9]  Charles L. Wilson,et al.  An empirical study of sample size in ROC-curve analysis of fingerprint data , 2006, SPIE Defense + Commercial Sensing.

[10]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[11]  B. Krauskopf,et al.  Proc of SPIE , 2003 .

[12]  Rob J Hyndman,et al.  Sample Quantiles in Statistical Packages , 1996 .

[13]  Raghu N. Kacker,et al.  Significance test in operational ROC analysis , 2010, Defense + Commercial Sensing.

[14]  W. R. Buckland Elements of Nonparametric Statistics , 1967 .

[15]  Michael D. Garris,et al.  Nonparametric statistical data analysis of fingerprint minutiae exchange with two-finger fusion , 2006, SPIE Defense + Commercial Sensing.

[16]  P. Hall On the Number of Bootstrap Simulations Required to Construct a Confidence Interval , 1986 .

[17]  Raghu N. Kacker,et al.  Further studies of bootstrap variability for ROC analysis on large datasets , 2010 .

[18]  Jin Chu Wu Studies of operational measurement of ROC curve on large fingerprint data sets using two-sample bootstrap , 2007 .

[19]  Sharath Pankanti,et al.  Guide to Biometrics , 2003, Springer Professional Computing.

[20]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[21]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[22]  Regina Y. Liu Moving blocks jackknife and bootstrap capture weak dependence , 1992 .

[23]  Raghu N. Kacker,et al.  Measures, Uncertainties, and Significance Test in Operational ROC Analysis , 2011, Journal of research of the National Institute of Standards and Technology.