Effect of Nonlinear Compression Function on the Performance of the Speaker Identification System under Noisy Conditions

The accurate speaker identification is difficult due to a number of factors. One of the most prominent factors is environmental noise. In this paper, the effect of two nonlinear compression functions, namely log and cubic root used in the feature extraction process, on the performance of the closed set text-independent speaker identification system under clean- and noisy-speaking environments is addressed. Performance is analyzed with Mel frequency cepstral coefficients (MFCC) and Gammatone frequency cepstral coefficients (GFCC). The Gaussian mixture model approach is used for speaker modeling. Two databases, namely, Marathi and Hindi databases were used for the experimentation. It has been observed that the cubic-root based features outperform the log based features under noisy conditions with SNR < 20 dB.

[1]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[2]  Patrice Alexandre,et al.  Root cepstral analysis: A unified view. Application to speech processing in car noise environments , 1993, Speech Commun..

[3]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[4]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[5]  DeLiang Wang,et al.  Analyzing noise robustness of MFCC and GFCC features in speaker identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  D. Howard,et al.  Speech and audio signal processing: processing and perception of speech and music [Book Review] , 2000 .

[7]  Thomas Fang Zheng,et al.  Overview of Front-end Features for Robust Speaker Recognition , 2011 .

[8]  James R. Glass,et al.  Robust Speaker Recognition in Unknown Noisy Conditions , 2005 .

[9]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10]  DeLiang Wang,et al.  Robust Speaker Identification in Noisy and Reverberant Conditions , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[12]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[13]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[14]  John H. L. Hansen,et al.  Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress , 1995, IEEE Trans. Speech Audio Process..

[15]  M. Schouten The auditory processing of speech : from sounds to words , 1992 .

[16]  Jean-Luc Gauvain,et al.  Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[18]  Thomas H. Crystal,et al.  Human vs. machine speaker identification with telephone speech , 1998, ICSLP.

[19]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[20]  Roy D. Patterson Auditory models as preprocessors for speech recognition , 1992 .

[21]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[23]  P. Nurmi Mixture Models , 2008 .

[24]  DeLiang Wang,et al.  Incorporating Auditory Feature Uncertainties in Robust Speaker Identification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[25]  Stephen J. Cox,et al.  Evaluating feature set performance using the f-ratio and j-measures , 1997, EUROSPEECH.

[26]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[27]  Yun Lei,et al.  A noise robust i-vector extractor using vector taylor series for speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.