Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization

For the state-of-the-art speech enhancement (SE) techniques, a spectrogram is usually preferred than the respective time-domain raw data, since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, two problems can cause distortions in the conventional nonnegative matrix factorization (NMF)-based SE algorithms. One is related to the overlap-and-add operation used in the short-time Fourier transform (STFT)-based signal reconstruction, and the other is concerned with directly using the phase of the noisy speech as that of the enhanced speech in signal reconstruction. These two problems can cause information loss or discontinuity when comparing the clean signal with the reconstructed signal. To solve these two problems, we propose a novel SE method that adopts discrete wavelet packet transform (DWPT) and NMF. In brief, the DWPT is first applied to split a time-domain speech signal into a series of subband signals. Then, we exploit NMF to highlight the speech component for each subband. These enhanced subband signals are joined together via the inverse DWPT to reconstruct a noise-reduced signal in time domain. We evaluate the proposed DWPT-NMF-based SE method on the Mandarin hearing in noise test (MHINT) task. Experimental results show that this new method effectively enhances speech quality and intelligibility and outperforms the conventional STFT-NMF-based SE system.

[1]  Jonathan Le Roux,et al.  Non-negative dynamical system with application to speech and audio , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Israel Cohen,et al.  Speech enhancement using a noncausal a priori SNR estimator , 2004, IEEE Signal Processing Letters.

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  David Malah,et al.  Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Yu Tsao,et al.  Speech enhancement using generalized maximum a posteriori spectral amplitude estimator , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Marc Moonen,et al.  Reduced-Bandwidth and Distributed MWF-Based Noise Reduction Algorithms for Binaural Hearing Aids , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Arne Leijon,et al.  Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Pejman Mowlaee,et al.  Phase Estimation in Single-Channel Speech Enhancement: Limits-Potential , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Jacob Benesty,et al.  Speech Enhancement , 2010 .

[12]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[13]  Yu Tsao,et al.  Ensemble modeling of denoising autoencoder for speech spectrum restoration , 2014, INTERSPEECH.

[14]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[15]  Homayoun Nikookar,et al.  An Investigation of Wavelet Packet Transform for Spectrum Estimation , 2013, ArXiv.

[16]  Sha Liu,et al.  Development of the Mandarin Hearing in Noise Test (MHINT) , 2007, Ear and hearing.

[17]  James M. Kates,et al.  The Hearing-Aid Speech Quality Index (HASQI) , 2010 .

[18]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Sandhya Hawaldar,et al.  Speech Enhancement for Nonstationary Noise Environments , 2011 .

[20]  DeLiang Wang,et al.  Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[22]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[23]  L G Potts,et al.  Differences and intersubject variability of loudness discomfort levels measured in sound pressure level and hearing level for TDH-50P and ER-3A earphones. , 1997, Journal of the American Academy of Audiology.

[24]  Li-Rong Dai,et al.  A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Eliathamby Ambikairajah,et al.  Speech enhancement for nonstationary noise environment , 2002, Asia-Pacific Conference on Circuits and Systems.

[26]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[27]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[28]  James M. Kates,et al.  The Hearing-Aid Speech Perception Index (HASPI) , 2014, Speech Commun..

[29]  Kuldip K. Paliwal,et al.  Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition , 2011, Speech Commun..

[30]  Yu Tsao,et al.  Improving denoising auto-encoder based speech enhancement with the speech parameter generation algorithm , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[31]  Thomas Fang Zheng,et al.  Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  DeLiang Wang,et al.  A Direct Masking Approach to Robust ASR , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Joachim M. Buhmann,et al.  Speech Enhancement Using Generative Dictionary Learning , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[36]  Jen-Tzung Chien,et al.  Bayesian Factorization and Learning for Monaural Source Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[37]  Yannis Stylianou,et al.  INTERSPEECH 2014 Special Session: Phase Importance in Speech Processing Applications , 2014 .

[38]  Daljeet Kaur Khanduja,et al.  Time Domain Signal Analysis Using Wavelet Packet Decomposition Approach , 2010, Int. J. Commun. Netw. Syst. Sci..

[39]  Yu Tsao,et al.  Speech enhancement using segmental nonnegative matrix factorization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).