Harmonic Phase Estimation in Single-Channel Speech Enhancement Using Phase Decomposition and SNR Information

In conventional single-channel speech enhancement, typically the noisy spectral amplitude is modified while the noisy phase is used to reconstruct the enhanced signal. Several recent attempts have shown the effectiveness of utilizing an improved spectral phase for phase-aware speech enhancement and consequently its positive impact on the perceived speech quality. In this paper, we present a harmonic phase estimation method relying on fundamental frequency and signal-to-noise ratio (SNR) information estimated from noisy speech. The proposed method relies on SNR-based time-frequency smoothing of the unwrapped phase obtained from the decomposition of the noisy phase. To incorporate the uncertainty in the estimated phase due to unreliable voicing decision and SNR estimate, we propose a binary hypothesis test assuming speech-present and speech-absent classes representing high and low SNRs. The effectiveness of the proposed phase estimation method is evaluated for both phase-only enhancement of noisy speech and in combination with an amplitude-only enhancement scheme. We show that by enhancing the noisy phase both perceived speech quality as well as speech intelligibility are improved as predicted by the instrumental metrics and justified by subjective listening tests.

[1]  MowlaeePejman,et al.  Harmonic phase estimation in single-channel speech enhancement using phase decomposition and SNR information , 2015 .

[2]  John H. L. Hansen,et al.  Speech enhancement using a constrained iterative sinusoidal model , 2001, IEEE Trans. Speech Audio Process..

[3]  Timo Gerkmann Bayesian Estimation of Clean Speech Spectral Coefficients Given a Priori Knowledge of the Phase , 2014, IEEE Transactions on Signal Processing.

[4]  Nicholas I. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[5]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[6]  Yannis Stylianou,et al.  Wrapped Gaussian Mixture Models for Modeling and High-Rate Quantization of Phase Data of Speech , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Philipp Berens,et al.  CircStat: AMATLABToolbox for Circular Statistics , 2009, Journal of Statistical Software.

[8]  Jon Barker,et al.  Modelling speaker intelligibility in noise , 2007, Speech Commun..

[9]  Akihiko Sugiyama,et al.  Phase randomization - A new paradigm for single-channel signal enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  John N. Gowdy,et al.  Exploiting the baseband phase structure of the voiced speech for speech enhancement , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Timo Gerkmann MMSE-optimal enhancement of complex speech coefficients with uncertain prior knowledge of the clean speech phase , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Jesper Jensen,et al.  DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[13]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[14]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[15]  W. Marsden I and J , 2012 .

[16]  Pascal Scalart,et al.  Improved Signal-to-Noise Ratio Estimation for Speech Enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[18]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[19]  Nicolas Sturmel,et al.  SIGNAL RECONSTRUCTION FROM STFT MAGNITUDE : A STATE OF THE ART , 2011 .

[20]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[22]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..

[23]  Mario Kaoru Watanabe,et al.  Iterative sinusoidal-based partial phase reconstruction in single-channel source separation , 2013, INTERSPEECH.

[24]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[25]  Peter Vary,et al.  Noise suppression by spectral magnitude estimation —mechanism and theoretical limits— , 1985 .

[26]  Daniel Erro,et al.  A measure of phase randomness for the harmonic model in speech synthesis , 2014, INTERSPEECH.

[27]  Pejman Mowlaee Beikzadehmahaleh On Speech Quality Estimation of Phase-Aware Single-Channel Speech Enhancement , 2015 .

[28]  Yannis Stylianou Removing linear phase mismatches in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[29]  Pejman Mowlaee Begzade Mahale,et al.  Harmonic phase estimation in single-channel speech enhancement using von mises distribution and prior SNR , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Mike Brookes,et al.  PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32]  Yannis Stylianou,et al.  The importance of phase on voice quality assessment , 2014, INTERSPEECH.

[33]  Rainer Martin,et al.  Bias compensation methods for minimum statistics noise power spectral density estimation , 2006, Signal Process..

[34]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[35]  Rainer Martin,et al.  Phase estimation for signal reconstruction in single-channel source separation , 2012, INTERSPEECH.

[36]  Kuldip K. Paliwal,et al.  Further intelligibility results from human listening tests using the short-time phase spectrum , 2006, Speech Commun..

[37]  Pejman Mowlaee,et al.  Iterative Closed-Loop Phase-Aware Single-Channel Speech Enhancement , 2013, IEEE Signal Processing Letters.

[38]  Pejman Mowlaee,et al.  Phase Estimation in Single-Channel Speech Enhancement: Limits-Potential , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[39]  Daniel Erro,et al.  A uniform phase representation for the harmonic model in speech synthesis applications , 2014, EURASIP J. Audio Speech Music. Process..

[40]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Yannis Stylianou,et al.  Phase importance in speech processing applications , 2014, INTERSPEECH.

[43]  Kuldip K. Paliwal,et al.  Group-delay-deviation based spectral analysis of speech , 2009, INTERSPEECH.

[44]  Timo Gerkmann,et al.  STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[45]  Andreas Gaich,et al.  On speech quality estimation of phase-aware single-channel speech enhancement , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46]  Rahim Saeidi,et al.  Time-frequency constraints for phase estimation in single-channel speech enhancement , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[47]  K. Mardia Statistics of Directional Data , 1972 .

[48]  Pejman Mowlaee Begzade Mahale,et al.  Phase Estimation in Single Channel Speech Enhancement Using Phase Decomposition , 2015, IEEE Signal Processing Letters.

[49]  Abeer Alwan,et al.  SAFE: A Statistical Approach to F0 Estimation Under Clean and Noisy Conditions , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[50]  Lawrence R. Rabiner,et al.  On the implementation of a short-time spectral analysis method for system identification , 1980 .

[51]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[52]  I. Saratxaga,et al.  Simple representation of signal phase for harmonic speech models , 2009 .