The Impact of Data Dependence on Speaker Recognition Evaluation

The data dependence due to multiple use of the same subjects has impact on the standard error (SE) of the detection cost function (DCF) in speaker recognition evaluation. The DCF is defined as a weighted sum of the probabilities of type I and type II errors at a given threshold. A two-layer data structure is constructed: Target scores are grouped into target sets based on the dependence, and likewise for non-target scores. On account of the needed equal probabilities for scores being selected when resampling, target sets must contain the same number of target scores, and so must non-target sets. In addition to the bootstrap method with i.i.d. assumption, the nonparametric two-sample one-layer and two-layer bootstrap methods are carried out based on whether the resampling takes place only on sets, or subsequently on scores within the sets. Due to the stochastic nature of the bootstrap, the distributions of the SEs of the DCF estimated using the three different bootstrap methods are created and compared. After performing hypothesis testing, it is found that data dependence increases not only the SE but also the variation of the SE, and the two-layer bootstrap is more conservative than the one-layer bootstrap. The rationale regarding the different impacts of the three bootstrap methods on the estimated SEs is investigated.

[1]  Raghu Kacker,et al.  Validation of Nonparametric Two-sample Bootstrap in ROC Analysis on Large Datasets , 2016, Commun. Stat. Simul. Comput..

[2]  Bin Ma,et al.  Sparse Classifier Fusion for Speaker Verification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Douglas A. Reynolds,et al.  The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective , 2000, Speech Commun..

[4]  Raghu Kacker,et al.  Bootstrap Variability Studies in ROC Analysis on Large Datasets , 2014, Commun. Stat. Simul. Comput..

[5]  John H. L. Hansen,et al.  Maximum Likelihood Acoustic Factor Analysis Models for Robust Speaker Verification in Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  K Linnet,et al.  Comparison of quantitative diagnostic tests: type I error, power, and sample size. , 1987, Statistics in medicine.

[7]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[8]  W. J. Langford Statistical Methods , 1959, Nature.

[9]  Charles L. Wilson,et al.  Nonparametric analysis of fingerprint data on large data sets , 2007, Pattern Recognit..

[10]  Will Tribbey,et al.  Numerical Recipes: The Art of Scientific Computing (3rd Edition) is written by William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery, and published by Cambridge University Press, © 2007, hardback, ISBN 978-0-521-88068-8, 1235 pp. , 1987, SOEN.

[11]  Charles L. Wilson,et al.  An empirical study of sample size in ROC-curve analysis of fingerprint data , 2006, SPIE Defense + Commercial Sensing.

[12]  David A. van Leeuwen,et al.  Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  James A Hanley,et al.  Comparison of three methods for estimating the standard error of the area under the curve in ROC analysis of quantitative data. , 2002, Academic radiology.

[14]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .

[15]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[16]  Bernard Ostle,et al.  Statistics in Research: Basic Concepts and Techniques for Research Workers , 1990 .

[17]  Rob J Hyndman,et al.  Sample Quantiles in Statistical Packages , 1996 .

[18]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[19]  Raghu N. Kacker,et al.  Measurement Uncertainties of Three Score Distributions and Two Thresholds with Data Dependency , 2014 .

[20]  Raghu N. Kacker,et al.  Data dependency on measurement uncertainties in speaker recognition evaluation , 2012, Defense, Security, and Sensing.

[21]  Raghu N. Kacker,et al.  Measures, Uncertainties, and Significance Test in Operational ROC Analysis , 2011, Journal of research of the National Institute of Standards and Technology.

[22]  Gregory A. Sanders,et al.  Bootstrap method versus analytical approach for estimating uncertainty of measures in ROC analysis on large datasets , 2018 .

[23]  Alvin F. Martin,et al.  NIST Speaker Recognition Evaluations Utilizing the Mixer Corpora—2004, 2005, 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Samy Bengio,et al.  Performance Generalization in Biometric Authentication Using Joint User-Specific and Sample Bootstraps , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Alvin F. Martin,et al.  The NIST speaker recognition evaluation program , 2005 .

[26]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[27]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[28]  Samy Bengio,et al.  A statistical significance test for person authentication , 2004, Odyssey.

[29]  Sharath Pankanti,et al.  Error analysis of pattern recognition systems - the subsets bootstrap , 2004, Comput. Vis. Image Underst..

[30]  Regina Y. Liu Moving blocks jackknife and bootstrap capture weak dependence , 1992 .

[31]  Vincent M. Stanford,et al.  Significance test with data dependency in speaker recognition evaluation , 2013, Defense, Security, and Sensing.

[32]  David A. van Leeuwen,et al.  NIST and NFI-TNO evaluations of automatic speaker recognition , 2006, Comput. Speech Lang..