Ensemble approach in speaker verification

The speech signal is a combination of attributes that contain information of the speaker, channel and noise. Conventional speaker verification systems train a single generic model for all cases, and handle all variations from these attributes either by factor analysis, or by not considering the variations explicitly. We propose a new methodology to partition the data space according to these factors and train separate models for each partition. The partitions may be obtained according to any attribute. We train models for the partitions discriminatively to maximize the separation between them. For classification we suggest multiple ways of combining scores from partitions. Experiments performed on the database NIST2008 show that our method improves the performance with respect to conventional methods when partitions are formed according to speakers. On noisy speech, partitions by noise result in the best performance.

[1]  Srinivasan Umesh,et al.  Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector , 2012, Int. J. Speech Technol..

[2]  James R. Glass,et al.  Towards multi-speaker unsupervised speech pattern discovery , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Yu Tsao,et al.  An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Paavo Alku,et al.  Regularization of all-pole models for speaker verification under additive noise , 2012, Odyssey.

[6]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[7]  John H. L. Hansen,et al.  Integrated Feature Normalization and Enhancement for robust Speaker Recognition using Acoustic Factor Analysis , 2012, INTERSPEECH.

[8]  Juan Arturo Nolazco-Flores,et al.  Continuous speech recognition in noise using spectral subtraction and HMM adaptation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Aaron E. Rosenberg,et al.  Speaker verification using minimum verification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  L. Burget,et al.  Promoting robustness for speaker modeling in the community: the PRISM evaluation set , 2011 .

[11]  Richard M. Stern,et al.  Data-driven environmental compensation for speech recognition: A unified approach , 1998, Speech Commun..

[12]  Li Deng,et al.  High-performance robust speech recognition using stereo training data , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[14]  Richard M. Stern,et al.  Optimization of the DET curve in speaker verification , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[15]  Alvin F. Martin,et al.  NIST 2008 speaker recognition evaluation: performance across telephone and room microphone channels , 2009, INTERSPEECH.

[16]  Padraig Cunningham,et al.  Combining cohort and UBM models in open set speaker detection , 2009, Multimedia Tools and Applications.

[17]  Gérard Chollet,et al.  Text-Independent Speaker Verification: State of the Art and Challenges , 2005, WNSP.

[18]  Lukás Burget,et al.  BUT system for NIST 2008 speaker recognition evaluation , 2009, INTERSPEECH.

[19]  Jun-ichi Takahashi,et al.  A new cohort normalization using local acoustic information for speaker verification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[20]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[21]  Jia Liu,et al.  Multiple Background Models for Speaker Verification , 2010, Odyssey.

[22]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Tomi Kinnunen,et al.  Efficient online cohort selection method for speaker verification , 2004, INTERSPEECH.

[24]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Ted S. Wada,et al.  Acoustic Model Enhancement: An Adaptation Technique for Speaker Verification Under Noisy Environments , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[26]  Li Deng,et al.  Evaluation of the SPLICE algorithm on the Aurora2 database , 2001, INTERSPEECH.