A probabilistic approach for phase estimation in single-channel speech enhancement using von mises phase priors

In many artificial intelligence systems human voice is considered as the medium for information transmission. Human-machine communication by voice becomes difficult when speech is mixed with some background noise. As a remedy, a single-channel speech enhancement is indispensable for reducing background noise from noisy speech to make it suitable for automatic speech recognition and telephony speech. While the conventional techniques for single-channel speech enhancement incorporate noisy phase in both amplitude estimation and signal reconstruction stages, in this paper we propose a probabilistic method to estimate the clean speech phase from noisy observation. Our proposed method consists of phase unwrapping followed by threshold-based temporal smoothing using von Mises phase priors. The proposed phase enhancement method leads to improved speech quality and intelligibility predicted by instrumental measures without explicit incorporation of amplitude enhancement.

[1]  K. Mardia Statistics of Directional Data , 1972 .

[2]  Yannis Stylianou,et al.  Phase importance in speech processing applications , 2014, INTERSPEECH.

[3]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Mike Brookes,et al.  PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Rainer Martin,et al.  On Phase Importance in Parameter Estimation for Single-Channel Source Separation , 2012, IWAENC.

[6]  Timo Gerkmann,et al.  MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase , 2013, IEEE Signal Processing Letters.

[7]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[8]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[9]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Rahim Saeidi,et al.  Time-frequency constraints for phase estimation in single-channel speech enhancement , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[11]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[12]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[13]  Rainer Martin,et al.  Phase estimation for signal reconstruction in single-channel source separation , 2012, INTERSPEECH.

[14]  Kuldip K. Paliwal,et al.  Iterative reconstruction of speech from short-time Fourier transform phase and magnitude spectra , 2007, Comput. Speech Lang..

[15]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Pejman Mowlaee,et al.  Iterative Closed-Loop Phase-Aware Single-Channel Speech Enhancement , 2013, IEEE Signal Processing Letters.

[17]  Bayya Yegnanarayana,et al.  Waveform estimation using group delay processing , 1985, IEEE Trans. Acoust. Speech Signal Process..

[18]  Mark Dolson,et al.  The Phase Vocoder: A Tutorial , 1986 .

[19]  Akihiko Sugiyama,et al.  Phase randomization - A new paradigm for single-channel signal enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  David Grant Rowe,et al.  Techniques for Harmonic Sinusoidal Coding , 1997 .

[21]  Carlos Eduardo Cancino Chacón,et al.  Least squares phase estimation of mixed signals , 2014, INTERSPEECH.

[22]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[23]  Timo Gerkmann,et al.  STFT Phase Improvement for Single Channel Speech Enhancement , 2012, IWAENC.

[24]  Kuldip K. Paliwal,et al.  Speech analysis using instantaneous frequency deviation , 2008, INTERSPEECH.

[25]  Kuldip K. Paliwal,et al.  Group-delay-deviation based spectral analysis of speech , 2009, INTERSPEECH.

[26]  John N. Gowdy,et al.  Exploiting the baseband phase structure of the voiced speech for speech enhancement , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Mario Kaoru Watanabe,et al.  Iterative sinusoidal-based partial phase reconstruction in single-channel source separation , 2013, INTERSPEECH.

[28]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[29]  Nicolas Sturmel,et al.  SIGNAL RECONSTRUCTION FROM STFT MAGNITUDE : A STATE OF THE ART , 2011 .

[30]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Pejman Mowlaee,et al.  Show & Tell: Phase-Aware Single-channel Speech Enhancement , 2013 .

[32]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..