Evaluation and calibration of short-term aging effects in speaker verification

A speaker verification evaluation is presented on the Multisession Audio Research Project (MARP) corpus, for which speakers were recorded at regular intervals, in consistent conditions, over a period of three years. It is observed that the performance of an i-vector system with probabilistic linear discriminant analysis (PLDA) modelling decreases progressively, in terms of both discrimination and calibration, as the time intervals between train and test sessions increase. For male speakers, the equal error rate (EER) increases from 2.4% to 4.4% when the interval between sessions grows from several months to three years. An extension to conventional linear score calibration is proposed, whereby short-term aging information is incorporated as an additional factor in the score transformation. This new approach improves discrimination and calibration performance in the presence of increasing time intervals between train and test sessions, compared with score-only calibration.

[1]  Stanley J. Wenndt,et al.  The multi-session audio research project (MARP) corpus: goals, design and initial findings , 2009, INTERSPEECH.

[2]  Douglas A. Reynolds,et al.  SHEEP, GOATS, LAMBS and WOLVES A Statistical Analysis of Speaker Performance in the NIST 1998 Speaker Recognition Evaluation , 1998 .

[3]  Richard Rhodes,et al.  Assessing non-contemporaneous forensic speech evidence: acoustic features, formant frequency-based likelihood ratios and ASR performance , 2013 .

[4]  David A. van Leeuwen A note on performance metrics for Speaker Recognition using multiple conditions in an evaluation , 2008 .

[5]  Andrzej Drygajlo,et al.  Speaker verification in score-ageing-quality classification space , 2013, Comput. Speech Lang..

[6]  John H. L. Hansen,et al.  I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification , 2013, INTERSPEECH.

[7]  David A. van Leeuwen,et al.  Effect of long-term ageing on i-vector speaker verification , 2014, INTERSPEECH.

[8]  David A. van Leeuwen,et al.  An Introduction to Application-Independent Evaluation of Speaker Recognition Systems , 2007, Speaker Classification.

[9]  John H. L. Hansen,et al.  Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux , 2013, IEEE Signal Processing Letters.

[10]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  John H. L. Hansen,et al.  Session variability contrasts in the MARP corpus , 2010, INTERSPEECH.

[12]  David A. van Leeuwen,et al.  Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Hermann J. Künzel Non-contemporary speech samples: auditory detectability of an 11 year delay and itseffect on automatic speaker identification , 2007 .

[14]  Larry P. Heck,et al.  MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[15]  Janet M Beck Organic Variation of the Vocal Apparatus , 2010 .

[16]  Vincent M. Stanford,et al.  The 2021 NIST Speaker Recognition Evaluation , 2022, Odyssey.

[17]  Douglas A. Reynolds,et al.  The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective , 2000, Speech Commun..

[18]  David Bowie The effect of geographic mobility on the retention of a local dialect , 2000 .

[19]  Nick Campbell,et al.  Investigating automatic measurements of prosodic accommodation and its dynamics in social interaction , 2014, Speech Commun..

[20]  Brett Y. Smolenski,et al.  Long term examination of intra-session and inter-session speaker variability , 2009, INTERSPEECH.

[21]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[22]  Doroteo Torre Toledano,et al.  Emulating DNA: Rigorous Quantification of Evidential Weight in Transparent and Testable Forensic Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Lukás Burget,et al.  Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[25]  John H. L. Hansen,et al.  Duration mismatch compensation for i-vector based speaker recognition systems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Niko Brümmer,et al.  Eigenageing compensation for speaker verification , 2013, INTERSPEECH.