Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech

This paper presents a system aiming at joint dereverberation and noise reduction by applying a combination of a beamformer with a single-channel spectral enhancement scheme. First, a minimum variance distortionless response beamformer with an online estimated noise coherence matrix is used to suppress noise and reverberation. The output of this beamformer is then processed by a single-channel spectral enhancement scheme, based on statistical room acoustics, minimum statistics, and temporal cepstrum smoothing, to suppress residual noise and reverberation. The evaluation is conducted using the REVERB challenge corpus, designed to evaluate speech enhancement algorithms in the presence of both reverberation and noise. The proposed system is evaluated using instrumental speech quality measures, the performance of an automatic speech recognition system, and a subjective evaluation of the speech quality based on a MUSHRA test. The performance achieved by beamforming, single-channel spectral enhancement, and their combination are compared, and experimental results show that the proposed system is effective in suppressing both reverberation and noise while improving the speech quality. The achieved improvements are particularly significant in conditions with high reverberation times.

[1]  Rainer Martin,et al.  On the Statistics of Spectral Amplitudes After Variance Reduction by Temporal Cepstrum Smoothing and Cepstral Nulling , 2009, IEEE Transactions on Signal Processing.

[2]  Tomohiro Nakatani,et al.  The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[3]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[4]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[5]  J. S. Bradley,et al.  On the importance of early reflections for speech in rooms. , 2003, The Journal of the Acoustical Society of America.

[6]  Patrick A. Naylor,et al.  Speech Dereverberation , 2010 .

[7]  Ibon Saratxaga,et al.  Detection of synthetic speech for the problem of imposture , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Emanuel A. P. Habets,et al.  On the application of reverberation suppression to robust speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Rainer Martin,et al.  Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Jacob Benesty,et al.  An Acoustic MIMO Framework for Analyzing Microphone-Array Beamforming , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Emanuel A. P. Habets,et al.  Late Reverberant Spectral Variance Estimation Based on a Statistical Model , 2009, IEEE Signal Processing Letters.

[13]  Walter Kellermann,et al.  On Blocking Matrix-Based Dereverberation for Automatic Speech Recognition , 2012, IWAENC.

[14]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Boaz Rafaely,et al.  Microphone Array Signal Processing , 2008 .

[16]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[17]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[18]  Emanuel A. P. Habets,et al.  A study on speech quality and speech intelligibility measures for quality assessment of single-channel dereverberation algorithms , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[19]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[20]  James R. Hopgood,et al.  The effect of sensor placement in blind source separation , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[21]  Emanuel A. P. Habets,et al.  Dereverberation in noisy environments using reference signals and a maximum likelihood estimator , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[22]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[23]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Jean Dickinson Gibbons,et al.  Nonparametric Statistical Inference , 1972, International Encyclopedia of Statistical Science.

[25]  Peter Vary,et al.  A blind speech enhancement algorithm for the suppression of late reverberation and noise , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  J. Polack Playing billiards in the concert hall: The mathematical foundations of geometrical room acoustics , 1993 .

[27]  J.-M. Boucher,et al.  A New Method Based on Spectral Subtraction for Speech Dereverberation , 2001 .

[28]  Sharon Gannot,et al.  Adaptive Beamforming and Postfiltering , 2008 .

[29]  Henry Cox,et al.  Robust adaptive beamforming , 2005, IEEE Trans. Acoust. Speech Signal Process..

[30]  John Thompson,et al.  European Signal Processing Conference (EUSIPCO) , 2016 .

[31]  Patrick A. Naylor,et al.  Noise-robust reverberation time estimation using spectral decay distributions with reduced computational cost , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Rainer Martin,et al.  A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Klaus Uwe Simmer,et al.  Superdirective Microphone Arrays , 2001, Microphone Arrays.

[34]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[35]  Emanuel A. P. Habets,et al.  Dual-Microphone Speech Dereverberation using a Reference Signal , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[36]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[37]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[38]  Jesper Jensen,et al.  Maximum likelihood based multi-channel isotropic reverberation reduction for hearing aids , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[39]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[40]  I. McCowan,et al.  The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[41]  Walter Kellermann,et al.  A two-channel reverberation suppression scheme based on blind signal separation and wiener filtering , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).