Enhancing acoustic models for robust speaker verification

Acoustic model enhancement (AME) refers to adapting the acoustic models to compensate for the distortion induced by a speech enhancement technique. This work extends the AME technique for speaker verification recently presented by incorporating the corresponding adaptation of the model variances, and by exploring the trade off between noise over-estimation and flooring distortion in the verification error. By using spectral subtraction (SS) as the speech enhancement technique, the extended AME highly outperformed SS alone particularly at moderately low SNRs (0 dB - 15 dB), where the adaptation of the variance was found to considerably improve the equal error rate (EER).

[1]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[2]  Li Deng,et al.  Evaluation of the SPLICE algorithm on the Aurora2 database , 2001, INTERSPEECH.

[3]  Douglas A. Reynolds,et al.  Corpora for the evaluation of speaker recognition systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  T.F. Quatieri,et al.  Speaker recognition from coded speech and the effects of score normalization , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[5]  Eduardo Lleida,et al.  Multi-environment models based linear normalization for speech recognition in car conditions , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Alvin F. Martin,et al.  NIST Speaker Recognition Evaluation Chronicles - Part 2 , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[7]  Ted S. Wada,et al.  Acoustic Model Enhancement: An Adaptation Technique for Speaker Verification Under Noisy Environments , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Alvin F. Martin,et al.  NIST speaker recognition evaluations 1996-2008 , 2009, Defense + Commercial Sensing.

[9]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[10]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[11]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[12]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..