Quality measures based calibration with duration and noise dependency for speaker recognition

Abstract This paper studies the effect of short utterances and noise on the performance of automatic speaker recognition. We focus on calibration aspects, and propose a calibration strategy that uses quality measures to model the calibration parameters. We carry out the proposed calibration by using simple Quality Measure Functions (QMFs) of duration and measured signal-to-noise-ratio from speech segments. We test the effectiveness of the approach using two databases, the development set of the I4U collaboration for the NIST Speaker Recognition Evaluation (SRE) 2012, and the evaluation test material of NIST SRE 2012 itself. In comparison with conventional linear calibration, results show that the proposed QMF approach successfully improves the system performance in terms of both discrimination and calibration.

[1]  Daniel Garcia-Romero,et al.  Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Julian Fiérrez,et al.  On the use of quality measures for text-independent speaker recognition , 2004, Odyssey.

[3]  Sridha Sridharan,et al.  Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques , 2014, Speech Commun..

[4]  Didier Meuwly,et al.  The inference of identity in forensic speaker recognition , 2000, Speech Commun..

[5]  Sridha Sridharan,et al.  Experiments in SVM-based Speaker Verification Using Short Utterances , 2010, Odyssey.

[6]  Driss Matrouf,et al.  Study of the Effect of I-vector Modeling on Short and Mismatch Utterance Duration for Speaker Verification , 2012, INTERSPEECH.

[7]  Yun Lei,et al.  Trial-based Calibration for Speaker Recognition in Unseen Conditions , 2014, Odyssey.

[8]  Francisco J. Samaniego,et al.  The Practice of Bayesian Analysis , 1999, Technometrics.

[9]  Julian Fiérrez,et al.  Multimodal biometric authentication using quality signals in mobile communications , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[10]  Hoirin Kim,et al.  Noise-Robust Speaker Recognition Using Subband Likelihoods and Reliable-Feature Selection , 2008 .

[11]  David A. van Leeuwen,et al.  NIST and NFI-TNO evaluations of automatic speaker recognition , 2006, Comput. Speech Lang..

[12]  Yun Lei,et al.  Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  尚弘 島影 National Institute of Standards and Technologyにおける超伝導研究及び生活 , 2001 .

[14]  Paavo Alku,et al.  Temporally Weighted Linear Prediction Features for Tackling Additive Noise in Speaker Verification , 2010, IEEE Signal Processing Letters.

[15]  Douglas A. Reynolds,et al.  The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective , 2000, Speech Commun..

[16]  David A. van Leeuwen,et al.  The distribution of calibrated likelihood-ratios in speaker recognition , 2013, INTERSPEECH.

[17]  Josef Kittler,et al.  Quality dependent fusion of intramodal and multimodal biometric experts , 2007, SPIE Defense + Commercial Sensing.

[18]  David A. van Leeuwen,et al.  Knowing the non-target speakers: The effect of the i-vector population for PLDA training in speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Julie Zhou Robust Estimationに , 2009 .

[20]  R Togneri,et al.  An Overview of Speaker Identification: Accuracy and Robustness Issues , 2011, IEEE Circuits and Systems Magazine.

[21]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Samy Bengio,et al.  A statistical significance test for person authentication , 2004, Odyssey.

[23]  Geoffrey Stewart Morrison,et al.  A comparison of procedures for the calculation of forensic likelihood ratios from acoustic-phonetic data: Multivariate kernel density (MVKD) versus Gaussian mixture model-universal background model (GMM-UBM) , 2011, Speech Commun..

[24]  Joaquín González-Rodríguez,et al.  Score-level compensation of extreme speech duration variability in speaker verification , 2010, INTERSPEECH.

[25]  Yun Lei,et al.  A noise robust i-vector extractor using vector taylor series for speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  David A. van Leeuwen,et al.  Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[28]  David A. van Leeuwen,et al.  The effect of noise on modern automatic speaker recognition systems , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Pascal Druyts,et al.  Applying Logistic Regression to the Fusion of the NIST'99 1-Speaker Submissions , 2000, Digit. Signal Process..

[31]  Themos Stafylakis,et al.  PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Niko Brümmer,et al.  Measuring, refining and calibrating speaker and language information extracted from speech , 2010 .

[33]  Rahim Saeidi,et al.  Calibration based on duration quality measure function in noise robust speaker recognition for N IST SRE'12 , 2013 .

[34]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[35]  John H. L. Hansen,et al.  I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification , 2013, INTERSPEECH.

[36]  Daniel Ramos,et al.  Forensic Automatic Speaker Classification in the "Coming Paradigm Shift" , 2007, Speaker Classification.

[37]  Sridha Sridharan,et al.  i-vector Based Speaker Recognition on Short Utterances , 2011, INTERSPEECH.

[38]  DeLiang Wang,et al.  Robust speaker identification using auditory features and computational auditory scene analysis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Julian Fiérrez,et al.  Using quality measures for multilevel speaker recognition , 2006, Comput. Speech Lang..

[40]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[41]  Saeed Vaseghi,et al.  Speaker identification in unknown noisy conditions - a universal compensation approach , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[42]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[43]  Hoirin Kim,et al.  Robust speaker recognition based on filtering in autocorrelation domain and sub-band feature recombination , 2010, Pattern Recognit. Lett..

[44]  Sachin S. Kajarekar,et al.  Class-dependent score combination for speaker recognition , 2005, INTERSPEECH.

[45]  David A. van Leeuwen,et al.  Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  Sridha Sridharan,et al.  Factor analysis subspace estimation for speaker verification with short utterances , 2008, INTERSPEECH.

[47]  Nicholas W. D. Evans,et al.  Improving the performance of text-independent short duration SVM- and GMM-based speaker verification , 2008, Odyssey.

[48]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[49]  David A. van Leeuwen,et al.  Evaluation of i-vector Speaker Recognition Systems for Forensic Application , 2011, INTERSPEECH.

[50]  John H. L. Hansen,et al.  Duration mismatch compensation for i-vector based speaker recognition systems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[51]  Alvin F. Martin,et al.  The NIST speaker recognition evaluation program , 2005 .

[52]  Javier Ortega-Garcia,et al.  Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition , 2006, Comput. Speech Lang..

[53]  Doroteo Torre Toledano,et al.  Emulating DNA: Rigorous Quantification of Evidential Weight in Transparent and Testable Forensic Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[54]  Sébastien Marcel,et al.  Score calibration in face recognition , 2014, IET Biom..

[55]  Lukás Burget,et al.  A unified approach for audio characterization and its application to speaker recognition , 2012, Odyssey.

[56]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[57]  David A. van Leeuwen,et al.  An Introduction to Application-Independent Evaluation of Speaker Recognition Systems , 2007, Speaker Classification.

[58]  Steven van de Par,et al.  Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling , 2012, IEEE Transactions on Audio, Speech, and Language Processing.