Entropy analysis of i-vector feature spaces in duration-sensitive speaker recognition

The vast majority of speaker recognition cross-entropy evaluations are focused on score domain. By examining the generalized relative distance between genuine and impostor sub-spaces, biometric characteristics become comparable to other authentication approaches. In this paper we demonstrate that the i-vector feature space's biometric information measured by relative entropy is comparable to e.g., knowledge-based mechanisms or face recognition. Examining NIST SRE 2004-2010 corpora, short samples of e.g, 5 seconds duration, comprise already 127 bits in a text-independent scenario. Further, the vast majority of short samples does not fall below 50% of the biometric information of samples having a duration of more than 40 seconds. The generalized i-vector feature space entropy of long samples corresponds to 182.1 bits, and the highest lower entropy bound of a subject was observed at 471.6 bits.

[1]  Douglas A. Reynolds,et al.  The NIST 2014 Speaker Recognition i-vector Machine Learning Challenge , 2014, Odyssey.

[2]  Joaquín González-Rodríguez,et al.  Cross-entropy analysis of the information in forensic speaker recognition , 2008, Odyssey.

[3]  Themos Stafylakis,et al.  PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[5]  Pietro Laface,et al.  Generative pairwise models for speaker recognition , 2014, Odyssey.

[6]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[7]  David A. van Leeuwen,et al.  Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  John Daugman,et al.  Probing the Uniqueness and Randomness of IrisCodes: Results From 200 Billion Iris Pair Comparisons , 2006, Proceedings of the IEEE.

[9]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10]  Christoph Busch,et al.  Towards Duration Invariance of i-Vector-based Adaptive Score Normalization , 2014, Odyssey.

[11]  Richard Youmaran,et al.  Towards a Measure of Biometric Information , 2006, 2006 Canadian Conference on Electrical and Computer Engineering.

[12]  Ray A. Perlner,et al.  Electronic Authentication Guideline: Recommendations of the National Institute of Standards and Technology (Special Publication 800-63-1) , 2012 .

[13]  John H. L. Hansen,et al.  Duration mismatch compensation for i-vector based speaker recognition systems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Driss Matrouf,et al.  Identify the Benefits of the Different Steps in an i-Vector Based Speaker Verification System , 2013, CIARP.

[15]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[16]  Yun Lei,et al.  Simplified VTS-based I-vector extraction in noise-robust speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Nalini K. Ratha,et al.  Enhancing security and privacy in biometrics-based authentication systems , 2001, IBM Syst. J..

[18]  Mario Baum,et al.  Handbook Of Biometrics , 2016 .

[19]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  D. V. Leeuwen,et al.  The Radboud University Nijmegen submission to NIST SRE-2012 , 2012 .

[21]  John H. L. Hansen,et al.  I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification , 2013, INTERSPEECH.