Psychoacoustic Model Compensation for Robust Speaker Verification in Environmental Noise

We investigate the problem of speaker verification in noisy conditions in this paper. Our work is motivated by the fact that environmental noise severely degrades the performance of speaker verification systems. We present a model compensation scheme based on the psychoacoustic principles that adapts the model parameters in order to reduce the training and verification mismatch. To deal with scenarios where accurate noise estimation is difficult, a modified multiconditioning scheme is proposed. The new algorithm was tested on two speech databases. The first database is the TIMIT database corrupted with white and pink noise and the noise estimation is fairly easy in this case. The second database is the MIT Mobile Device Speaker Verification Corpus (MITMDSVC) containing realistic noisy speech data which makes the noise estimation difficult. The proposed scheme achieves significant performance gain over the baseline system in both cases.

[1]  Mark J. F. Gales,et al.  HMM recognition in noise using parallel model combination , 1993, EUROSPEECH.

[2]  Liang Lu,et al.  Eigenchannel Compensation and Symmetric Score for Robust Text-Independent Speaker Verification , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.

[3]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Davis Pan,et al.  A Tutorial on MPEG/Audio Compression , 1995, IEEE Multim..

[5]  Néstor Becerra Yoma,et al.  Speaker verification in noise using a stochastic version of the weighted Viterbi algorithm , 2002, IEEE Trans. Speech Audio Process..

[6]  Martin J. Russell,et al.  Text-dependent speaker verification under noisy conditions using parallel model combination , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[8]  Jean-Luc Gauvain,et al.  Speaker verification over the telephone , 2000, Speech Commun..

[9]  Javier Ortega-Garcia,et al.  Overview of speech enhancement techniques for automatic speaker recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  D. Reynolds,et al.  Pc-based Tms320c30 Implementation of the Gaussian Mixture Model Text-independent Speaker Recognition System , 2022 .

[11]  Andrzej Drygajlo,et al.  Speaker verification in noisy environments with combined spectral subtraction and missing feature theory , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Izzet Kale,et al.  Tonality Index of Sigma-Delta Modulators : A Psychoacoustics Model Based Approach , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[13]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[14]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[15]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[16]  Satoshi Takahashi,et al.  Jacobian approach to fast acoustic model adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Ming Li,et al.  Harmonic Structure Features for Robust Speaker Recognition against Channel Effect , 2009, 2009 Second International Symposium on Information Science and Engineering.

[18]  Thambipillai Srikanthan,et al.  A probabilistic approach to spectral subtraction for robust text-independent speaker verification , 2009 .

[19]  William M. Hartmann,et al.  Psychoacoustics: Facts and Models , 2001 .

[20]  Douglas E. Sturim,et al.  Robust Speaker Recognition with Cross-Channel Data: MIT-LL Results on the 2006 NIST SRE Auxiliary Microphone Task , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[21]  Thomas F. Quatieri,et al.  A comparison of soft and hard spectral subtraction for speaker verification , 2004, INTERSPEECH.

[22]  Ted Painter,et al.  Audio Signal Processing and Coding , 2007 .

[23]  Ramesh A. Gopinath,et al.  Short-time Gaussianization for robust speaker verification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  James R. Glass,et al.  A Comparative Study of Methods for Handheld Speaker Verification in Realistic Noisy Conditions , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[25]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[26]  Alex Park,et al.  The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[27]  Paavo Alku,et al.  Temporally Weighted Linear Prediction Features for Tackling Additive Noise in Speaker Verification , 2010, IEEE Signal Processing Letters.

[28]  Ted S. Wada,et al.  Acoustic Model Enhancement: An Adaptation Technique for Speaker Verification Under Noisy Environments , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[29]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[30]  Michael Picheny,et al.  Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[31]  T. Srikanthan,et al.  Improved Spectral Subtraction Technique for Text-Independent Speaker Verification , 2007, 2007 15th International Conference on Digital Signal Processing.

[32]  Andreas Spanias,et al.  A review of algorithms for perceptual coding of digital audio signals , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[33]  Lukás Burget,et al.  Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.