STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement

The enhancement of speech which is corrupted by noise is commonly performed in the short-time discrete Fourier transform domain. In case only a single microphone signal is available, typically only the spectral amplitude is modified. However, it has recently been shown that an improved spectral phase can as well be utilized for speech enhancement, e.g., for phase-sensitive amplitude estimation. In this paper, we therefore present a method to reconstruct the spectral phase of voiced speech from only the fundamental frequency and the noisy observation. The importance of the spectral phase is highlighted and we elaborate on the reason why noise reduction can be achieved by modifications of the spectral phase. We show that, when the noisy phase is enhanced using the proposed phase reconstruction, instrumental measures predict an increase of speech quality over a range of signal to noise ratios, even without explicit amplitude enhancement.

[1]  Robert Rehr,et al.  Phase-sensitive real-time capable speech enhancement under voiced-unvoiced uncertainty , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[2]  John H. L. Hansen,et al.  Speech enhancement using a constrained iterative sinusoidal model , 2001, IEEE Trans. Speech Audio Process..

[3]  Kuldip K. Paliwal,et al.  Group-delay-deviation based spectral analysis of speech , 2009, INTERSPEECH.

[4]  Ahmed H. Tewfik,et al.  Low bit rate high quality audio coding with combined harmonic and wavelet representations , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Tolga Çiloglu,et al.  Speech enhancement by maintaining phase continuity , 2012, 2012 20th Signal Processing and Communications Applications Conference (SIU).

[6]  Andreas Jakobsson,et al.  Joint fundamental frequency and order estimation using optimal filtering , 2009, 2009 17th European Signal Processing Conference.

[7]  Thomas F. Quatieri,et al.  Noise reduction using a soft-decision sine-wave vector quantizer , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  Deep Sen,et al.  Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures , 2010, IEEE Signal Processing Letters.

[9]  T. Houtgast,et al.  On the significance of phase in the short term Fourier spectrum for speech intelligibility. , 2010, The Journal of the Acoustical Society of America.

[10]  T. Gerkmann,et al.  Phase estimation in speech enhancement — Unimportant, important, or impossible? , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[11]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[13]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Kuldip K. Paliwal,et al.  Speech analysis using instantaneous frequency deviation , 2008, INTERSPEECH.

[15]  D. Griffin,et al.  Speech synthesis from short-time Fourier transform magnitude and its application to speech processing , 1984, ICASSP.

[16]  Mike Brookes,et al.  PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..

[18]  Timo Gerkmann,et al.  MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase , 2013, IEEE Signal Processing Letters.

[19]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[20]  Matthew McCallum,et al.  Stochastic-Deterministic MMSE STFT Speech Enhancement With General A Priori Information , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Timo Gerkmann Bayesian Estimation of Clean Speech Spectral Coefficients Given a Priori Knowledge of the Phase , 2014, IEEE Transactions on Signal Processing.

[22]  Ronald E. Crochiere,et al.  A study of complexity and quality of speech waveform coders , 1978, ICASSP.

[23]  M. Deisher,et al.  Speech enhancement using state-based estimation and sinusoidal modeling , 1997 .

[24]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[25]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[26]  Jonathan Le Roux,et al.  Consistent Wiener Filtering for Audio Source Separation , 2013, IEEE Signal Processing Letters.

[27]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[29]  Peter Vary,et al.  Noise suppression by spectral magnitude estimation —mechanism and theoretical limits— , 1985 .

[30]  Francis Charpentier,et al.  Pitch detection using the short-term phase spectrum , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Nicolas Sturmel,et al.  SIGNAL RECONSTRUCTION FROM STFT MAGNITUDE : A STATE OF THE ART , 2011 .

[32]  Akihiko Sugiyama,et al.  Phase randomization - A new paradigm for single-channel signal enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Rainer Martin,et al.  Phase estimation for signal reconstruction in single-channel source separation , 2012, INTERSPEECH.

[34]  Timo Gerkmann,et al.  STFT Phase Improvement for Single Channel Speech Enhancement , 2012, IWAENC.