Sequential UBM adaptation for speaker verification

GMM-UBM-based speaker verification heavily relies on a well trained UBM. In practice, it is not often easy to obtain an UBM that fully matches acoustic channels in operation. To solve this problem, we propose a novel sequential MAP adaptation approach: by being sequentially updated with data from new enrollments, the UBM learns and converges to the working channel. Our experiments are conducted on a time-varying speech database, with two channel-mismatched UBMs as the initial model. The results confirm that the sequential UBM adaptation provides significant performance improvement, leading to a relative EER reduction of 6.3% and 14.8% for the two mismatched UBMs, respectively.

[1]  Larry P. Heck,et al.  Handset-dependent background models for robust text-independent speaker recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Pietro Laface,et al.  Channel Factors Compensation in Model and Feature Domain for Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[3]  Lukás Burget,et al.  Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Kishore Prahallad,et al.  Speaker verification: minimizing the channel effects using autoassociative neural network models , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Patrick Kenny,et al.  Disentangling speaker and channel effects in speaker verification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Xiaodong He,et al.  Robust feature space adaptation for telephony speech recognition , 2006, INTERSPEECH.

[8]  Driss Matrouf,et al.  A straightforward and efficient implementation of the factor analysis model for speaker verification , 2007, INTERSPEECH.

[9]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.

[11]  Mark J. F. Gales,et al.  Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[12]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[13]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[14]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[15]  Thomas Fang Zheng,et al.  Creation of Time-Varying Voiceprint Database , 2010 .

[16]  Bin Ma,et al.  A Generalized Feature Transformation Approach for Channel Robust Speaker Verification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Larry P. Heck,et al.  A model-based transformational approach to robust speaker recognition , 2000, INTERSPEECH.

[18]  Andreas Stolcke,et al.  Speaker Recognition With Session Variability Normalization Based on MLLR Adaptation Transforms , 2007, IEEE Transactions on Audio, Speech, and Language Processing.