Phase Estimation in Single Channel Speech Enhancement Using Phase Decomposition

Conventional speech enhancement methods typically utilize the noisy phase spectrum for signal reconstruction. This letter presents a novel method to estimate the clean speech phase spectrum, given the noisy speech observation in single-channel speech enhancement. The proposed method relies on the phase decomposition of the instantaneous noisy phase spectrum followed by temporal smoothing in order to reduce the large variance of noisy phase, and consequently reconstructs an enhanced instantaneous phase spectrum for signal reconstruction. The effectiveness of the proposed method is evaluated in two ways: phase enhancement-only and by quantifying the additional improvement on top of the conventional amplitude enhancement scheme where noisy phase is often used in signal reconstruction. The instrumental metrics predict a consistent improvement in perceived speech quality and speech intelligibility when the noisy phase is enhanced using the proposed phase estimation method.

[1]  Pejman Mowlaee,et al.  Iterative Closed-Loop Phase-Aware Single-Channel Speech Enhancement , 2013, IEEE Signal Processing Letters.

[2]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Marc Moonen,et al.  Speech Understanding Performance of Cochlear Implant Subjects Using Time–Frequency Masking-Based Noise Reduction , 2012, IEEE Transactions on Biomedical Engineering.

[4]  Yannis Stylianou,et al.  Wrapped Gaussian Mixture Models for Modeling and High-Rate Quantization of Phase Data of Speech , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Ronald E. Crochiere,et al.  A study of complexity and quality of speech waveform coders , 1978, ICASSP.

[6]  Jesper Jensen,et al.  DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[7]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Mike Brookes,et al.  PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[10]  Henning Puder,et al.  Integrating recursive minimum tracking and codebook-based noise estimation for improved reduction of non-stationary noise , 2012, Signal Process..

[11]  Timo Gerkmann MMSE-optimal enhancement of complex speech coefficients with uncertain prior knowledge of the clean speech phase , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Yannis Stylianou Removing linear phase mismatches in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[13]  Peter Vary,et al.  Noise suppression by spectral magnitude estimation —mechanism and theoretical limits— , 1985 .

[14]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Yannis Stylianou,et al.  Phase importance in speech processing applications , 2014, INTERSPEECH.

[16]  Rainer Martin,et al.  Phase estimation for signal reconstruction in single-channel source separation , 2012, INTERSPEECH.

[17]  Kuldip K. Paliwal,et al.  Iterative reconstruction of speech from short-time Fourier transform phase and magnitude spectra , 2007, Comput. Speech Lang..

[18]  Rahim Saeidi,et al.  On phase importance in parameter estimation in single-channel speech enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  W. Bastiaan Kleijn,et al.  Codebook-Based Bayesian Speech Enhancement for Nonstationary Environments , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  I. Saratxaga,et al.  Simple representation of signal phase for harmonic speech models , 2009 .

[21]  Timo Gerkmann,et al.  STFT Phase Improvement for Single Channel Speech Enhancement , 2012, IWAENC.

[22]  Mario Kaoru Watanabe,et al.  A probabilistic approach for phase estimation in single-channel speech enhancement using von mises phase priors , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[23]  Timo Gerkmann,et al.  MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase , 2013, IEEE Signal Processing Letters.

[24]  Akihiko Sugiyama,et al.  Phase randomization - A new paradigm for single-channel signal enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Yannis Stylianou,et al.  The importance of phase on voice quality assessment , 2014, INTERSPEECH.

[26]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[27]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[28]  Erdal Mehmetcik,et al.  Speech enhancement by maintaining phase continuity between consecutive analysis frames , 2012 .

[29]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[30]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  John R. Hershey,et al.  Monaural speech separation and recognition challenge , 2010, Comput. Speech Lang..

[32]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  John N. Gowdy,et al.  Exploiting the baseband phase structure of the voiced speech for speech enhancement , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..

[35]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[36]  Ibon Saratxaga,et al.  Perceptual Importance of the Phase Related Information in Speech , 2012, INTERSPEECH.

[37]  R Drullman,et al.  Temporal envelope and fine structure cues for speech intelligibility. , 1994, The Journal of the Acoustical Society of America.

[38]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Rainer Martin,et al.  Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Rainer Martin,et al.  Cepstral Smoothing of Spectral Filter Gains for Speech Enhancement Without Musical Noise , 2007, IEEE Signal Processing Letters.