Robust speaker identification using auditory features and computational auditory scene analysis

The performance of speaker recognition systems drop significantly under noisy conditions. To improve robustness, we have recently proposed novel auditory features and a robust speaker recognition system using a front-end based on computational auditory scene analysis. In this paper, we further study the auditory features by exploring different feature dimensions and incorporating dynamic features. In addition, we evaluate the features and robust recognition in a speaker identification task in a number of noisy conditions. We find that one of the auditory features performs substantially better than a conventional speaker feature. Furthermore, our recognition system achieves significant performance improvements compared with an advanced front-end in a wide range of signal-to-noise conditions.

[1]  DeLiang Wang,et al.  Sequential organization in computational auditory scene analysis , 2007 .

[2]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[3]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[4]  DeLiang Wang,et al.  Transforming Binary Uncertainties for Robust Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[6]  M. Schouten The auditory processing of speech : from sounds to words , 1992 .

[7]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[8]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[9]  Andrzej Drygajlo,et al.  Speaker verification in noisy environments with combined spectral subtraction and missing feature theory , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[11]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[13]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[14]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[15]  Thomas H. Crystal,et al.  Human vs. machine speaker identification with telephone speech , 1998, ICSLP.

[16]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[17]  DeLiang Wang,et al.  Incorporating Auditory Feature Uncertainties in Robust Speaker Identification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[18]  Li Deng,et al.  Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion , 2005, IEEE Transactions on Speech and Audio Processing.

[19]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[20]  VargaAndrew,et al.  Assessment for automatic speech recognition II , 1993 .

[21]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[22]  Til T. Phan,et al.  Text-Independent Speaker Identification , 1999 .

[23]  Roy D. Patterson Auditory models as preprocessors for speech recognition , 1992 .

[24]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[25]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[26]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[27]  Naveen Parihar,et al.  Analysis of the Aurora large vocabulary evaluations , 2003, INTERSPEECH.

[28]  Sadaoki Furui,et al.  Speaker recognition using HMM composition in noisy environments , 1996, Comput. Speech Lang..