Adaptive wavelet shrinkage for noise robust speaker recognition

Abstract Speaker recognition faces many practical difficulties, among which signal inconsistency due to environmental and acquisition channel factors is most challenging. The noise imposed to the voice signal varies greatly and a priori noise model is usually unavailable. In this article, we propose a robust speaker recognition method that employs a novel adaptive wavelet shrinkage method for noise suppression. In our method, wavelet subband coefficient thresholds are automatically computed, which are proportional to the noise contamination. In the application of wavelet shrinkage for noise removal, a dual-threshold strategy is developed to suppress noise, preserve signal coefficients and minimize the introduction of artifacts. The recognition is achieved using modification of Mel-frequency cepstral coefficient of overlapped voice signal segments. The efficacy of our method is evaluated with voice signals from two public available speech signal databases and is compared with state-of-the-art methods. It is demonstrated that our proposed method exhibits great robustness in various noise conditions. The improvement is significant especially when noise dominates the underlying speech.

[1]  Marc Teboulle,et al.  A fast Iterative Shrinkage-Thresholding Algorithm with application to wavelet-based image deblurring , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[3]  Yasser Ghanbari,et al.  A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets , 2006, Speech Commun..

[4]  Pawan Kumar,et al.  Spoken Language Identification Using Hybrid Feature Extraction Methods , 2010, ArXiv.

[5]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[6]  Mark J. F. Gales,et al.  Noisy Constrained Maximum-Likelihood Linear Regression for Noise-Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Yuan-Ting Zhang,et al.  The application of bionic wavelet transform to speech signal processing in cochlear implants using neural network simulations , 2002, IEEE Transactions on Biomedical Engineering.

[9]  Leonardo Zao,et al.  Colored Noise Based Multicondition Training Technique for Robust Speaker Identification , 2011, IEEE Signal Processing Letters.

[10]  Douglas A. Reynolds,et al.  Missing feature theory with soft spectral subtraction for speaker verification , 2006, INTERSPEECH.

[11]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[12]  J. Rajnoha,et al.  Modified Feature Extraction Methods in Robust Speech Recognition , 2007, 2007 17th International Conference Radioelektronika.

[13]  Haizhou Li,et al.  Low-Variance Multitaper MFCC Features: A Case Study in Robust Speaker Verification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[15]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[16]  John H. L. Hansen,et al.  A Study on Universal Background Model Training in Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Figen Ertaş,et al.  FUNDAMENTALS OF SPEAKER RECOGNITION , 2011 .

[18]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  S. R. Mahadeva Prasanna,et al.  Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system , 2005, IEEE Transactions on Speech and Audio Processing.

[20]  Douglas A. Reynolds,et al.  Corpora for the evaluation of speaker recognition systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[21]  Stéphane Mallat,et al.  Singularity detection and processing with wavelets , 1992, IEEE Trans. Inf. Theory.

[22]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[23]  A. Petosic,et al.  Signal denoising using STFT with Bayes prediction and Ephraim-Malah estimation , 2012, Proceedings ELMAR-2012.

[24]  Savita Gupta,et al.  Image Denoising Using Wavelet Thresholding , 2002, ICVGIP.

[25]  Cha Zhang,et al.  CROWDMOS: An approach for crowdsourcing mean opinion score studies , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Xiaohui Yuan,et al.  Subband noise estimation for adaptive wavelet shrinkage , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[27]  Michael T. Johnson,et al.  Speech signal enhancement through adaptive wavelet thresholding , 2007, Speech Commun..

[28]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Yongqiang Wang,et al.  Speaker and Noise Factorization for Robust Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  P. Woodland,et al.  A computational model of the auditory periphery for speech and hearing research. II. Descending paths. , 1994, The Journal of the Acoustical Society of America.

[31]  J. Rouat,et al.  Wavelet speech enhancement based on the Teager energy operator , 2001, IEEE Signal Processing Letters.

[32]  S. Mallat A wavelet tour of signal processing , 1998 .

[33]  Thambipillai Srikanthan,et al.  Psychoacoustic Model Compensation for Robust Speaker Verification in Environmental Noise , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Yuan-Ting Zhang,et al.  Bionic wavelet transform: a new time-frequency method based on an auditory model , 2001, IEEE Trans. Biomed. Eng..

[35]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[36]  Khaled Daqrouq,et al.  An investigation of speech enhancement using wavelet filtering method , 2010, Int. J. Speech Technol..

[37]  Sridhar Krishna Nemala,et al.  A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  Wang Hong,et al.  Modified MFCCs for robust speaker recognition , 2010, 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[39]  M El-RabaieEl-Sayed,et al.  Speech enhancement with an adaptive Wiener filter , 2014 .

[40]  Xiaohui Yuan,et al.  A Wavelet-Based Noise-Aware Method for Fusing Noisy Imagery , 2007, 2007 IEEE International Conference on Image Processing.

[41]  Douglas A. Reynolds,et al.  The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective , 2000, Speech Commun..

[42]  Yuan-Fu Liao,et al.  Latent Prosody Analysis for Robust Speaker Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  Steven van de Par,et al.  Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[45]  I. Johnstone,et al.  Wavelet Threshold Estimators for Data with Correlated Noise , 1997 .

[46]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .