Harmonic phase estimation in single-channel speech enhancement using von mises distribution and prior SNR

In single-channel speech enhancement the spectral amplitude of the noisy signal is often modified while the noisy spectral phase is directly employed for signal reconstruction. Recently, additional improvement in speech enhancement performance has been reported when the noisy phase is modified. In this work, we propose a Bayesian estimator for phase of harmonics given the noisy speech. The proposed estimator relies on the fundamental frequency and the signal-to-noise ratio at harmonics. Throughout our experiments, we evaluate the performance of the proposed phase enhancement in comparison with the noisy phase, a benchmark and the clean phase as the upper-bound. The proposed method leads to joint improvement in quality and intelligibility at different SNRs and noise types.

[1]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[2]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[3]  Terrence J. Sejnowski,et al.  Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Timo Gerkmann MMSE-optimal enhancement of complex speech coefficients with uncertain prior knowledge of the clean speech phase , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Nicholas I. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[6]  John N. Gowdy,et al.  Exploiting the baseband phase structure of the voiced speech for speech enhancement , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Mario Kaoru Watanabe,et al.  Iterative sinusoidal-based partial phase reconstruction in single-channel source separation , 2013, INTERSPEECH.

[8]  Kuldip K. Paliwal,et al.  Group-delay-deviation based spectral analysis of speech , 2009, INTERSPEECH.

[9]  Carlos Eduardo Cancino Chacón,et al.  Least squares phase estimation of mixed signals , 2014, INTERSPEECH.

[10]  Yannis Stylianou,et al.  Phase importance in speech processing applications , 2014, INTERSPEECH.

[11]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..

[12]  Daniel Erro,et al.  A uniform phase representation for the harmonic model in speech synthesis applications , 2014, EURASIP J. Audio Speech Music. Process..

[13]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[14]  Rahim Saeidi,et al.  Time-frequency constraints for phase estimation in single-channel speech enhancement , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[15]  Nicolas Sturmel,et al.  Informed Source Separation Using Iterative Reconstruction , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[17]  Rainer Martin,et al.  Phase estimation for signal reconstruction in single-channel source separation , 2012, INTERSPEECH.

[18]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[19]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[20]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Akihiko Sugiyama,et al.  Phase randomization - A new paradigm for single-channel signal enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  A. Alwan,et al.  A Unified Framework for Designing Optimal STSA Estimators Assuming Maximum Likelihood Phase Equivalence of Speech and Noise , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Jonathan Le Roux,et al.  Consistent Wiener Filtering for Audio Source Separation , 2013, IEEE Signal Processing Letters.

[24]  Timo Gerkmann,et al.  STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[26]  Simon J. Godsill,et al.  Efficient Alternatives to the Ephraim and Malah Suppression Rule for Audio Signal Enhancement , 2003, EURASIP J. Adv. Signal Process..

[27]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[28]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[29]  A. Oppenheim,et al.  Signal reconstruction from phase or magnitude , 1980 .

[30]  Pejman Mowlaee,et al.  Iterative Closed-Loop Phase-Aware Single-Channel Speech Enhancement , 2013, IEEE Signal Processing Letters.

[31]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Mike Brookes,et al.  PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[33]  Timo Gerkmann,et al.  MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase , 2013, IEEE Signal Processing Letters.

[34]  Jesper Jensen,et al.  DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[35]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[36]  Philipp Berens,et al.  CircStat: AMATLABToolbox for Circular Statistics , 2009, Journal of Statistical Software.

[37]  Mario Kaoru Watanabe,et al.  A probabilistic approach for phase estimation in single-channel speech enhancement using von mises phase priors , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).