Iterative Closed-Loop Phase-Aware Single-Channel Speech Enhancement

Many short-time Fourier transform (STFT) based single-channel speech enhancement algorithms are focused on estimating the clean speech spectral amplitude from the noisy observed signal in order to suppress the additive noise. To this end, they utilize the noisy amplitude information and the corresponding a priori and a posteriori SNRs while they employ the observed noisy phase when reconstructing enhanced speech signal. This paper presents two contributions: i) reconsidering the relation between the phase group delay deviation and phase deviation, and ii) proposing a closed-loop single-channel speech enhancement approach to estimate both amplitude and phase spectra of the speech signal. To this end, we combine a group-delay based phase estimator with a phase-aware amplitude estimator in a closed loop design. Our experimental results on various noise scenarios show considerable improvement in the objective perceived signal quality obtained by the proposed iterative phase-aware approach compared to conventional Wiener filtering which uses the noisy phase in signal reconstruction.

[1]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Peter Vary,et al.  Noise suppression by spectral magnitude estimation —mechanism and theoretical limits— , 1985 .

[3]  Robert Boorstyn,et al.  Single tone parameter estimation from discrete-time observations , 1974, IEEE Trans. Inf. Theory.

[4]  A. Oppenheim,et al.  Signal reconstruction from phase or magnitude , 1980 .

[5]  Kuldip K. Paliwal,et al.  Usefulness of phase spectrum in human speech perception , 2003, INTERSPEECH.

[6]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[7]  Hirokazu Kameoka,et al.  Consistent Wiener Filtering: Generalized Time-Frequency Masking Respecting Spectrogram Consistency , 2010, LVA/ICA.

[8]  John R. Hershey,et al.  Monaural speech separation and recognition challenge , 2010, Comput. Speech Lang..

[9]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Rainer Martin,et al.  Phase estimation for signal reconstruction in single-channel source separation , 2012, INTERSPEECH.

[11]  Kuldip K. Paliwal,et al.  Group-delay-deviation based spectral analysis of speech , 2009, INTERSPEECH.

[12]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[13]  Rahim Saeidi,et al.  On phase importance in parameter estimation in single-channel speech enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[15]  Pejman Mowlaee,et al.  Show & Tell: Phase-Aware Single-channel Speech Enhancement , 2013 .

[16]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  John H. L. Hansen,et al.  An effective quality evaluation protocol for speech enhancement algorithms , 1998, ICSLP.

[18]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Mario Kaoru Watanabe,et al.  Iterative sinusoidal-based partial phase reconstruction in single-channel source separation , 2013, INTERSPEECH.

[20]  Nicolas Sturmel,et al.  Iterative phase reconstruction of wiener filtered signals , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[22]  Timo Gerkmann,et al.  MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase , 2013, IEEE Signal Processing Letters.

[23]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[24]  Deep Sen,et al.  Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures , 2010, IEEE Signal Processing Letters.

[25]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[26]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.