Noise Robust Speaker Identification: Using Nonlinear Modeling Techniques

Session variability is one of the challenging tasks in forensic speaker identification. This variability in terms of mismatched environments seriously degrades the identification performance. In order to address the problem of environment mismatch due to noise, different types of robust features are discussed in this chapter. In state-of-the art features, the speech production system is modeled as a linear source-filter model. However, this modeling technique neglects some nonlinear aspects of speech production, which carry some speaker-specific information. Furthermore, the state-of-the art features are based on either speech production mechanism or speech perception mechanism. To overcome such limitations of existing features, features derived using non-linear modeling techniques are proposed in the chapter. The proposed features, Teager energy operator based cepstral coefficients (TEOCC) and amplitude-frequency modulation (AM-FM) based ‘Q’ features show significant improvement in speaker identification rate in mismatched environments. The performance of these features is evaluated for different types of noise signals in the NOISEX-92 database with clean training and noisy testing environments. The speaker identification rate achieved is 57% using TEOCC features and 97% using AM-FM based ‘Q’ features for 0 dB SNR compared to 25.5% using MFCC features, when the signal is corrupted by car engine noise. It is shown that, with the proposed features, speaker identification accuracy can be increased in presence of noise, without any additional pre-processing of the signal to remove noise.

[1]  R. Patterson,et al.  A pulse ribbon model of monaural phase perception. , 1987, The Journal of the Acoustical Society of America.

[2]  Thomas H. Crystal,et al.  Human vs. machine speaker identification with telephone speech , 1998, ICSLP.

[3]  Fred Cummins,et al.  Speaker Identification Using Instantaneous Frequencies , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Kuldip K. Paliwal,et al.  On the usefulness of STFT phase spectrum in human listening tests , 2005, Speech Commun..

[5]  Kuldip K. Paliwal,et al.  Usefulness of phase spectrum in human speech perception , 2003, INTERSPEECH.

[6]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[7]  Jianwu Dang,et al.  An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification , 2008, Speech Commun..

[8]  Kuldip K. Paliwal,et al.  Further intelligibility results from human listening tests using the short-time phase spectrum , 2006, Speech Commun..

[9]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[10]  J.H.L. Hansen,et al.  High resolution speech feature parametrization for monophone-based stressed speech recognition , 2000, IEEE Signal Processing Letters.

[11]  P. Loughlin,et al.  On the amplitude‐ and frequency‐modulation decomposition of signals , 1996 .

[12]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[13]  Kourosh Saberi,et al.  A common neural code for frequency- and amplitude-modulated sounds , 1995, Nature.

[14]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[15]  Fan-Gang Zeng,et al.  Speech recognition with amplitude and frequency modulations. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Ching-Tang Hsieh,et al.  Robust speech features based on wavelet transform with application to speaker identification , 2002 .

[17]  Kiyoshi Honda,et al.  Physiological Processes of Speech Production , 2008 .

[18]  Jian-Da Wu,et al.  Speaker identification using discrete wavelet packet transform technique with irregular decomposition , 2009, Expert Syst. Appl..

[19]  Giorgio Biagetti,et al.  Multicomponent AM–FM Representations: An Asymptotically Exact Approach , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  K. Honda,et al.  Cyclicity of laryngeal cavity resonance due to vocal fold vibration. , 2006, The Journal of the Acoustical Society of America.

[21]  David A. van Leeuwen,et al.  Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  P. Maragos,et al.  Speech formant frequency and bandwidth tracking using multiband energy demodulation , 1996 .

[23]  Douglas D. O'Shaughnessy Speech Communications: Human and Machine , 2012 .

[24]  E. Lindemann,et al.  Phase relationships and amplitude envelopes in auditory perception , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[25]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[26]  Peter Vary,et al.  Digital Speech Signal Processing , 2004 .

[27]  Jianwu Dang,et al.  An improved vocal tract model of vowel production implementing piriform resonance and transvelar nasal coupling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[28]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals , 1992, Proc. IEEE.

[29]  H. Teager Some observations on oral air flow during phonation , 1980 .

[30]  Douglas A. Reynolds,et al.  Measuring fine structure in speech: application to speaker identification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[31]  L. Gavidia-Ceballos,et al.  A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment , 1998, IEEE Transactions on Biomedical Engineering.

[32]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[33]  Patrick Kenny,et al.  Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Simon King,et al.  Speech and Audio Signal Processing , 2011 .

[35]  Raghunath S. Holambe,et al.  Robust Q Features for Speaker Identification , 2009, 2009 International Conference on Advances in Recent Technologies in Communication and Computing.

[36]  Fumitada Itakura,et al.  Text-dependent speaker recognition using the information in the higher frequency band , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Alvin F. Martin,et al.  NIST Speaker Recognition Evaluations Utilizing the Mixer Corpora—2004, 2005, 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  James F. Kaiser,et al.  Some useful properties of Teager's energy operators , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[40]  Doroteo Torre Toledano,et al.  Emulating DNA: Rigorous Quantification of Evidential Weight in Transparent and Testable Forensic Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  A. Enis Çetin,et al.  Teager energy based feature parameters for speech recognition in car noise , 1999, IEEE Signal Processing Letters.

[42]  Petros Maragos,et al.  Robust AM-FM features for speech recognition , 2005, IEEE Signal Processing Letters.

[43]  Pranab K. Dutta,et al.  The Wavelet Packet Based Cepstral Features for Open Set Speaker Classification in Marathi , 2005, GfKl.

[44]  Javier Ortega-Garcia,et al.  On the application of the Bayesian approach in real forensic conditions with GMM-based systems , 2001, Odyssey.

[45]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[46]  Douglas A. Reynolds,et al.  Fine structure features for speaker identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[47]  Petros Maragos,et al.  A comparison of the energy operator and the Hilbert transform approach to signal and speech demodulation , 1994, Signal Process..

[48]  Omar Farooq,et al.  Mel filter-like admissible wavelet packet structure for speech recognition , 2001, IEEE Signal Processing Letters.

[49]  K Honda,et al.  Acoustic characteristics of the piriform fossa in models and humans. , 1997, The Journal of the Acoustical Society of America.

[50]  Ea-Ee Jan,et al.  Selective use of the speech spectrum and a VQGMM method for speaker identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[52]  Bayya Yegnanarayana,et al.  Speaker-specific mapping for text-independent speaker recognition , 2003, Speech Commun..

[53]  Sridha Sridharan,et al.  Explicit modelling of session variability for speaker verification , 2008, Comput. Speech Lang..

[54]  Raghunath S. Holambe,et al.  Improving Speaker Identification in noisy Environment , 2009, IICAI.

[55]  Jianwu Dang,et al.  Physiological Feature Extraction for Text Independent Speaker Identification using Non-Uniform Subband Processing , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[56]  Thomas F. Quatieri,et al.  AM-FM separation using auditory-motivated filters , 1997, IEEE Trans. Speech Audio Process..

[57]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[58]  Ramdas Kumaresan,et al.  On decomposing speech into modulated components , 2000, IEEE Trans. Speech Audio Process..

[59]  Raghunath S. Holambe,et al.  Speaker Identification Based on Robust AM-FM Features , 2009, 2009 Second International Conference on Emerging Trends in Engineering & Technology.

[60]  Hugo Leonardo Rufiner,et al.  Automatic speaker identification by means of Mel cepstrum, wavelets and wavelet packets , 2000, Proceedings of the 22nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (Cat. No.00CH37143).