Analysis of Score Normalization in Multilingual Speaker Recognition

NIST Speaker Recognition Evaluation 2016 has revealed the importance of score normalization for mismatched data conditions. This paper analyzes several score normalization techniques for test conditions with multiple languages. The best performing one for a PLDA classifier is an adaptive s-norm with 30% relative improvement over the system without any score normalization. The analysis shows that the adaptive score normalization (using top scoring files per trial) selects cohorts that in 68% contain recordings from the same language and in 92% of the same gender as the enrollment and test recordings. Our results suggest that the data to select score normalization cohorts should be a pool of several languages and channels and if possible, its subset should contain data from the target domain.

[1]  T.F. Quatieri,et al.  The effects of telephone transmission degradations on speaker recognition performance , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  D. A. Reynolds,et al.  The effects of handset variability on speaker recognition performance: experiments on the Switchboard corpus , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Douglas A. Reynolds,et al.  Comparison of background normalization methods for text-independent speaker verification , 1997, EUROSPEECH.

[4]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[5]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[6]  Frédéric Bimbot,et al.  A Monte-Carlo method for score normalization in Automatic Speaker Verification using Kullback-Leibler distances , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Arnon D. Cohen,et al.  On cohort selection for speaker verification , 2003, INTERSPEECH.

[8]  Aladdin M. Ariyaeeinia,et al.  Relative effectiveness of score normalisation methods in open-set speaker identification , 2004, Odyssey.

[9]  Hagai Aronowitz,et al.  Modeling intra-speaker variability for speaker recognition , 2005, INTERSPEECH.

[10]  Douglas E. Sturim,et al.  Speaker adaptive cohort selection for Tnorm in text-independent speaker verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Sridha Sridharan,et al.  Modelling session variability in text-independent speaker verification , 2005, INTERSPEECH.

[12]  Y. Zigel HowtoDealwithMultiple-Targets inSpeaker Identification Systems? , 2006 .

[13]  Julian Fiérrez,et al.  Speaker verification using speaker- and test-dependent fast score normalization , 2007, Pattern Recognit. Lett..

[14]  Patrick Kenny,et al.  The role of speaker factors in the NIST extended data task , 2008, Odyssey.

[15]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Hagai Aronowitz,et al.  Efficient score normalization for speaker recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  James R. Glass,et al.  Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification , 2010, Odyssey.

[18]  Lukás Burget,et al.  Data selection and calibration issues in automatic language recognition - investigation with BUT-AGNITIO NIST LRE 2009 system , 2010, Odyssey.

[19]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[20]  Vijendra Raj Apsingekar,et al.  Speaker verification score normalization using speaker model clusters , 2011, Speech Commun..

[21]  Lukás Burget,et al.  Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  William M. Campbell,et al.  Towards reduced false-alarms using cohorts , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Pietro Laface,et al.  Comparison of Speaker Recognition Approaches for Real Applications , 2011, INTERSPEECH.

[24]  L. Burget,et al.  Promoting robustness for speaker modeling in the community: the PRISM evaluation set , 2011 .

[25]  Nuance - Politecnico di torino's 2012 NIST speaker recognition evaluation system , 2013, INTERSPEECH.

[26]  Douglas A. Reynolds,et al.  Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems , 2014, Odyssey.

[27]  Lukás Burget,et al.  BAT System Description for NIST LRE 2015 , 2016, Odyssey.

[28]  Niko Brümmer,et al.  Analysis and Description of ABC Submission to NIST SRE 2016 , 2017, INTERSPEECH.

[29]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).