Phase Estimation in Single-Channel Speech Enhancement: Limits-Potential

In this paper, we present an overview on the previous and recent methods proposed to estimate a clean spectral phase from a noisy observation in the context of single-channel speech enhancement. The importance of phase estimation in speech enhancement is inspired by the recent reports on its usefulness in finding a phase-sensitive amplitude estimation. We present a comparative study of the recent phase estimation methods and elaborate their limits. We propose a new phase enhancement method relying on phase decomposition and time-frequency smoothing filters. We demonstrate that the proposed time-frequency phase smoothing method successfully reduces the variance of the noisy phase at harmonics. Our results on different speech and noise databases and different signal-to-noise ratios show that in contrast to the existing benchmark methods only the proposed method balances a tradeoff between a joint improvement in perceived quality of 0.2 in PESQ score and speech intelligibility of 2% by phase-only enhancement.

[1]  Nicholas I. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[2]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[5]  Rainer Martin,et al.  Phase estimation for signal reconstruction in single-channel source separation , 2012, INTERSPEECH.

[6]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Daniel Erro,et al.  A uniform phase representation for the harmonic model in speech synthesis applications , 2014, EURASIP J. Audio Speech Music. Process..

[8]  Deep Sen,et al.  Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures , 2010, IEEE Signal Processing Letters.

[9]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[10]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Abeer Alwan,et al.  SAFE: A Statistical Approach to F0 Estimation Under Clean and Noisy Conditions , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[13]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[14]  Yannis Stylianou,et al.  Wrapped Gaussian Mixture Models for Modeling and High-Rate Quantization of Phase Data of Speech , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Pejman Mowlaee Begzade Mahale,et al.  Phase Estimation in Single Channel Speech Enhancement Using Phase Decomposition , 2015, IEEE Signal Processing Letters.

[16]  Yannis Agiomyrgiannakis,et al.  Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Andreas Jakobsson,et al.  Multi-Pitch Estimation , 2009, Multi-Pitch Estimation.

[18]  Pejman Mowlaee,et al.  Iterative Closed-Loop Phase-Aware Single-Channel Speech Enhancement , 2013, IEEE Signal Processing Letters.

[19]  Kuldip K. Paliwal,et al.  Further intelligibility results from human listening tests using the short-time phase spectrum , 2006, Speech Commun..

[20]  Daniel Erro,et al.  A measure of phase randomness for the harmonic model in speech synthesis , 2014, INTERSPEECH.

[21]  Mario Kaoru Watanabe,et al.  A probabilistic approach for phase estimation in single-channel speech enhancement using von mises phase priors , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[22]  Timo Gerkmann Bayesian Estimation of Clean Speech Spectral Coefficients Given a Priori Knowledge of the Phase , 2014, IEEE Transactions on Signal Processing.

[23]  Mike Brookes,et al.  PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[25]  Nicolas Sturmel,et al.  Informed Source Separation Using Iterative Reconstruction , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[27]  Rahim Saeidi,et al.  On phase importance in parameter estimation in single-channel speech enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Luís B. Almeida,et al.  Nonstationary spectral modeling of voiced speech , 1983 .

[29]  Jonathan Le Roux,et al.  Consistent Wiener Filtering for Audio Source Separation , 2013, IEEE Signal Processing Letters.

[30]  Timo Gerkmann,et al.  STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  N. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[32]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[33]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[34]  Rahim Saeidi,et al.  Time-frequency constraints for phase estimation in single-channel speech enhancement , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[35]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  John N. Gowdy,et al.  Exploiting the baseband phase structure of the voiced speech for speech enhancement , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Mario Kaoru Watanabe,et al.  Iterative sinusoidal-based partial phase reconstruction in single-channel source separation , 2013, INTERSPEECH.

[38]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..

[39]  Catherine Forbes,et al.  von Mises Distribution , 2010 .

[40]  Kuldip K. Paliwal,et al.  Group-delay-deviation based spectral analysis of speech , 2009, INTERSPEECH.

[41]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[42]  Carlos Eduardo Cancino Chacón,et al.  Least squares phase estimation of mixed signals , 2014, INTERSPEECH.

[43]  Yannis Stylianou,et al.  The importance of phase on voice quality assessment , 2014, INTERSPEECH.

[44]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[45]  Erdal Mehmetcik,et al.  Speech enhancement by maintaining phase continuity between consecutive analysis frames , 2012 .

[46]  Akihiko Sugiyama,et al.  Phase randomization - A new paradigm for single-channel signal enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.