Estimation of Subband Speech Correlations for Noise Reduction via MVDR Processing

Recently, it has been proposed to use the minimum-variance distortionless-response (MVDR) approach in single-channel speech enhancement in the short-time frequency domain. By applying optimal FIR filters to each subband signal, these filters reduce additive noise components with less speech distortion compared to conventional approaches. An important ingredient to these filters is the temporal correlation of the speech signals. We derive algorithms to provide a blind estimation of this quantity based on a maximum-likelihood and maximum a-posteriori estimation. To derive proper models for the inter-frame correlation of the speech and noise signals, we investigate their statistics on a large dataset. If the speech correlation is properly estimated, the previously derived subband filters discussed in this work show significantly less speech distortion compared to conventional noise reduction algorithms. Therefore, the focus of the experimental parts of this work lies on the quality and intelligibility of the processed signals. To evaluate the performance of the subband filters in combination with the clean speech inter-frame correlation estimators, we predict the speech quality and intelligibility by objective measures.

[1]  J. Capon High-resolution frequency-wavenumber spectrum analysis , 1969 .

[2]  Thomas Esch,et al.  Speech enhancement using a modified Kalman filter based on complex linear prediction and supergaussian priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Yonina C. Eldar,et al.  A Competitive Mean-Squared Error Approach to Beamforming , 2007, IEEE Transactions on Signal Processing.

[4]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[6]  E. Hänsler,et al.  Acoustic Echo and Noise Control: A Practical Approach , 2004 .

[7]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  L. Scharf,et al.  Statistical Signal Processing of Complex-Valued Data: The Theory of Improper and Noncircular Signals , 2010 .

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Rainer Martin,et al.  Online inter-frame correlation estimation methods for speech enhancement in frequency subbands , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Jacob Benesty,et al.  Recursive and Fast Recursive Capon Spectral Estimators , 2007, EURASIP J. Adv. Signal Process..

[12]  S. Biyiksiz,et al.  Multirate digital signal processing , 1985, Proceedings of the IEEE.

[13]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Henning Puder Kalman-filters in subbands for noise reduction with enhanced pitch-adaptive speech model estimation , 2002, Eur. Trans. Telecommun..

[15]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[16]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[17]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[18]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[19]  Rainer Martin,et al.  Advances in Digital Speech Transmission , 2008 .

[20]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Jacob Benesty,et al.  A single-channel noise reduction MVDR filter , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Rainer Martin,et al.  Efficient Implementation of Single-Channel Noise Reduction for Hearing Aids Using a Cascaded Filter-Bank , 2012, ITG Conference on Speech Communication.

[23]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[24]  Stephan Weiss,et al.  Design of near perfect reconstruction oversampled filter banks for subband adaptive filters , 1999 .

[25]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Peter Vary,et al.  Exploiting Temporal Correlation of Speech and Noise Magnitudes Using a Modified Kalman Filter for Speech Enhancement , 2011 .

[27]  Wen-Rong Wu,et al.  Subband Kalman filtering for speech enhancement , 1998 .

[28]  Boaz Rafaely,et al.  Microphone Array Signal Processing , 2008 .

[29]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[30]  O. Besson,et al.  Steering vector uncertainties and diagonal loading , 2004, Processing Workshop Proceedings, 2004 Sensor Array and Multichannel Signal.

[31]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[32]  Abel M. Rodrigues Matrix Algebra Useful for Statistics , 2007 .

[33]  Woon-Seng Gan,et al.  Subband Adaptive Filtering: Theory and Implementation , 2009 .

[34]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[35]  M. Varanasi,et al.  Parametric generalized Gaussian density estimation , 1989 .

[36]  Jacob Benesty,et al.  A Multi-Frame Approach to the Frequency-Domain Single-Channel Noise Reduction Problem , 2012, IEEE Transactions on Audio, Speech, and Language Processing.