Supervised speech enhancement using online Group-Sparse Convolutive NMF

In supervised speech enhancement methods based on Non-negative Matrix Factorization (NMF), signals are described as linear combinations of dictionary atoms. In order to learn dictionary atoms capable of revealing the hidden structure in speech, long temporal context of speech signals must be considered. In contrast to the standard NMF, convolutive model has an advantage of finding repeated patterns possessed by many realistic signals. Learning spectro-temporal atoms spanning several consecutive frames is done through training large volumes of data-sets which places unrealistic demand on computation power and memory. In this paper a new algorithm based on Convolutive NMF is proposed to identify automatically temporal patterns in speech without the two mentioned obstacles. Online approach is addressed to save memory in processing large data-sets. To tackle the problem of large computation power, group sparsity constraint is employed. The results of the proposed algorithm show that using online Group-Sparse Convolutive NMF algorithm can significantly increase the enhanced clean speech PESQ.

[1]  W. Bastiaan Kleijn,et al.  HMM-Based Gain Modeling for Enhancement of Speech in Noise , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[3]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[4]  Hadi Veisi,et al.  Speech enhancement using hidden Markov models in Mel-frequency domain , 2013, Speech Commun..

[5]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[6]  Paris Smaragdis,et al.  Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Renato D. C. Monteiro,et al.  Group Sparsity in Nonnegative Matrix Factorization , 2012, SDM.

[8]  Yariv Ephraim,et al.  A Bayesian estimation approach for speech enhancement using hidden Markov models , 1992, IEEE Trans. Signal Process..

[9]  Unto K. Laine,et al.  A method for noise-robust context-aware pattern discovery and recognition from categorical sequences , 2012, Pattern Recognit..

[10]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[11]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[12]  W. Bastiaan Kleijn,et al.  On causal algorithms for speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Israel Cohen,et al.  Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models , 2006, Signal Process..

[14]  Thomas Fang Zheng,et al.  Online Non-Negative Convolutive Pattern Learning for Speech Signals , 2013, IEEE Transactions on Signal Processing.

[15]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[16]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Jesper Jensen,et al.  MMSE based noise PSD tracking with low complexity , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Rainer Martin,et al.  Spectral Domain Speech Enhancement Using HMM State-Dependent Super-Gaussian Priors , 2013, IEEE Signal Processing Letters.

[19]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[20]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[21]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[22]  Hamid Sheikhzadeh,et al.  HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[23]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[24]  Bin Chen,et al.  A Laplacian-based MMSE estimator for speech enhancement , 2007, Speech Commun..

[25]  Hongtao Lu,et al.  Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning , 2011, BMVC.

[26]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[27]  W. Bastiaan Kleijn,et al.  Codebook driven short-term predictor parameter estimation for speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Thippur V. Sreenivas,et al.  Codebook constrained Wiener filtering for speech enhancement , 1996, IEEE Trans. Speech Audio Process..

[29]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.