A Generative Model for Score Normalization in Speaker Recognition

We propose a theoretical framework for thinking about score normalization, which confirms that normalization is not needed under (admittedly fragile) ideal conditions. If, however, these conditions are not met, e.g. under data-set shift between training and runtime, our theory reveals dependencies between scores that could be exploited by strategies such as score normalization. Indeed, it has been demonstrated over and over experimentally, that various ad-hoc score normalization recipes do work. We present a first attempt at using probability theory to design a generative score-space normalization model which gives similar improvements to ZT-norm on the text-dependent RSR 2015 database.

[1]  Niko Brümmer,et al.  Bayesian calibration for forensic evidence reporting , 2014, INTERSPEECH.

[2]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[3]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[4]  Douglas A. Reynolds,et al.  Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems , 2014, Odyssey.

[5]  J. Besag A candidate's formula: A curious result in Bayesian prediction , 1989 .

[6]  Bin Ma,et al.  The RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases , 2012, Interspeech 2012.

[7]  Niko Brümmer,et al.  Analysis and Description of ABC Submission to NIST SRE 2016 , 2017, INTERSPEECH.

[8]  Douglas E. Sturim,et al.  Speaker adaptive cohort selection for Tnorm in text-independent speaker verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  James R. Glass,et al.  Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification , 2010, Odyssey.

[10]  Niko Brümmer,et al.  A comparison of linear and non-linear calibrations for speaker recognition , 2014, Odyssey.

[11]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[12]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[13]  Pietro Laface,et al.  Comparison of Speaker Recognition Approaches for Real Applications , 2011, INTERSPEECH.

[14]  Themos Stafylakis,et al.  Joint Factor Analysis for Text-Dependent Speaker Verification , 2014, Odyssey.

[15]  Niko Brümmer,et al.  Unsupervised Domain Adaptation for I-Vector Speaker Recognition , 2014, Odyssey.

[16]  Jirí Navrátil,et al.  The awe and mystery of t-norm , 2003, INTERSPEECH.