Robust Speaker Recognition Using Denoised Vocal Source and Vocal Tract Features

To alleviate the problem of severe degradation of speaker recognition performance under noisy environments because of inadequate and inaccurate speaker-discriminative information, a method of robust feature estimation that can capture both vocal source- and vocal tract-related characteristics from noisy speech utterances is proposed. Spectral subtraction, a simple yet useful speech enhancement technique, is employed to remove the noise-specific components prior to the feature extraction process. It has been shown through analytical derivation, as well as by simulation results, that the proposed feature estimation method leads to robust recognition performance, especially at low signal-to-noise ratios. In the context of Gaussian mixture model-based speaker recognition with the presence of additive white Gaussian noise, the new approach produces consistent reduction of both identification error rate and equal error rate at signal-to-noise ratios ranging from 0 to 15 dB.

[1]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[2]  Sadaoki Furui,et al.  Speaker recognition , 1997, Scholarpedia.

[3]  Zdravko Kacic,et al.  A study of harmonic features for the speaker recognition , 1997, Speech Commun..

[4]  S. R. Mahadeva Prasanna,et al.  Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[5]  Andrzej Drygajlo,et al.  Speaker verification in noisy environments with combined spectral subtraction and missing feature theory , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Heinz Hügli,et al.  Usefulness of the LPC-residue in text-independent speaker verification , 1995, Speech Commun..

[7]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[8]  Jean-Claude Junqua,et al.  alpha-Jacobian environmental adaptation , 2004, Speech Commun..

[9]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[10]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[11]  David A. van Leeuwen,et al.  Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[13]  Douglas A. Reynolds,et al.  Comparison of background normalization methods for text-independent speaker verification , 1997, EUROSPEECH.

[14]  Mark J. F. Gales,et al.  HMM recognition in noise using parallel model combination , 1993, EUROSPEECH.

[15]  Nengheng Zheng,et al.  CU 2 C : A Dual-condition Cantonese Speech Database for Speaker Recognition Applications , 2005 .

[16]  Ke Chen,et al.  Personalize Mobile Access By Speaker Authentication , 2002 .

[17]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[18]  Aggelos K. Katsaggelos,et al.  Audio-Visual Biometrics , 2006, Proceedings of the IEEE.

[19]  Christophe Beaugeant,et al.  An evaluation of VTS and IMM for speaker verification in noise , 2003, INTERSPEECH.

[20]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[21]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[22]  Nengheng Zheng,et al.  Time -frequency analysis of vocal source signal for speaker recognition , 2004, INTERSPEECH.

[23]  D. O'Shaughnessy,et al.  Speaker recognition , 1986, IEEE ASSP Magazine.

[24]  Marcos Faúndez-Zanuy,et al.  Investigation on LP-residual representations for speaker identification , 2009, Pattern Recognit..

[25]  Steven Kay,et al.  Modern Spectral Estimation: Theory and Application , 1988 .

[26]  Nengheng Zheng,et al.  Integration of Complementary Acoustic Features for Speaker Recognition , 2007, IEEE Signal Processing Letters.

[27]  Wai Nang Chan,et al.  Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[29]  Bayya Yegnanarayana,et al.  Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.

[30]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[32]  Hsiao-Chuan Wang,et al.  Improvement of speaker recognition by combining residual and prosodic features with acoustic features , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.