Hybrid Method for Speech Enhancement Using α-Divergence

A hybrid method for speech enhancement based on Non-Negative Matrix Factorization (NMF) and statistical modeling is presented for using speech and noise bases with online updating is proposed. In the presence of nonstationary noises, template-based approaches have shown better performance when compared to statistical modeling but these approaches depend on a priori information. To overcome the drawbacks of these approaches, a hybrid method is developed. The performance of the proposed method is further improved by considering speech bases as well as noise bases. In terms of Source-to-Distortion ratio (SDR) and Perceptual Evaluation of Speech Quality (PESQ) the proposed method have outperformed the traditional algorithms in nonstationary noise environment conditions.

[1]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[3]  Yunkeun Lee,et al.  Non-negative Matrix Factorization Based Noise Reduction for Noise Robust Automatic Speech Recognition , 2012, LVA/ICA.

[4]  Bhiksha Raj,et al.  Regularized non-negative matrix factorization with temporal dependencies for speech denoising , 2008, INTERSPEECH.

[5]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[6]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: I. Comparison of assessment methods , 1993, Speech Commun..

[7]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[9]  Kiyohiro Shikano,et al.  Musical-Noise-Free Speech Enhancement Based on Optimized Iterative Spectral Subtraction , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Arne Leijon,et al.  A new linear MMSE filter for single channel speech enhancement based on Nonnegative Matrix Factorization , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[11]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[12]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[14]  Nam Soo Kim,et al.  Speech enhancement combining statistical models and NMF with update of speech and noise bases , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).