Coherent-to-Diffuse Power Ratio Estimation for Dereverberation

The estimation of the time- and frequency-dependent coherent-to-diffuse power ratio (CDR) from the measured spatial coherence between two omnidirectional microphones is investigated. Known CDR estimators are formulated in a common framework, illustrated using a geometric interpretation in the complex plane, and investigated with respect to bias and robustness towards model errors. Several novel unbiased CDR estimators are proposed, and it is shown that knowledge of either the direction of arrival (DOA) of the target source or the coherence of the noise field is sufficient for unbiased CDR estimation. The validity of the model for the application of CDR estimates to dereverberation is investigated using measured and simulated impulse responses. A CDR-based dereverberation system is presented and evaluated using signal-based quality measures as well as automatic speech recognition accuracy. The results show that the proposed unbiased estimators have a practical advantage over existing estimators, and that the proposed DOA-independent estimator can be used for effective blind dereverberation.

[1]  Emmanuel Vincent,et al.  Designing the Wiener post-filter for diffuse noise suppression using imaginary parts of inter-channel cross-spectra , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Gary W. Elko,et al.  Spatial Coherence Functions for Differential Microphones in Isotropic Noise Fields , 2001, Microphone Arrays.

[3]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  E. Hänsler,et al.  Acoustic Echo and Noise Control: A Practical Approach , 2004 .

[5]  Gary W. Elko,et al.  Superdirectional microphone arrays , 2000 .

[6]  J. S. Bradley,et al.  On the importance of early reflections for speech in rooms. , 2003, The Journal of the Acoustical Society of America.

[7]  A.B. Baggeroer,et al.  The state of the art in underwater acoustic telemetry , 2000, IEEE Journal of Oceanic Engineering.

[8]  Giovanni Del Galdo,et al.  On the spatial coherence in mixed sound fields and its application to signal-to-diffuse ratio estimation. , 2012, The Journal of the Acoustical Society of America.

[9]  Jesper Jensen,et al.  Maximum likelihood based multi-channel isotropic reverberation reduction for hearing aids , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[10]  J.-M. Boucher,et al.  A New Method Based on Spectral Subtraction for Speech Dereverberation , 2001 .

[11]  Joerg Bitzer,et al.  Post-Filtering Techniques , 2001, Microphone Arrays.

[12]  R. Morgan Mobile radio communications. , 1982, Hospital engineering.

[13]  Torsten Dau,et al.  Binaural dereverberation based on interaural coherence histograms. , 2013, The Journal of the Acoustical Society of America.

[14]  Peter Händel,et al.  Decay Rate Estimators and Their Performance for Blind Reverberation Time Estimation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Tomohiro Nakatani,et al.  The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[16]  Emanuel A. P. Habets,et al.  Diffuseness estimation with high temporal resolution via spatial coherence between virtual first-order microphones , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[17]  Hao Ye,et al.  Maximum likelihood DOA estimation and asymptotic Cramer-Rao bounds for additive unknown colored noise , 1995, IEEE Trans. Signal Process..

[18]  L. Danilenko Binaurales Hören im nichtstationären diffusen Schallfeld , 1969, Kybernetik.

[19]  Walter Kellermann,et al.  Coherence-based Dereverberation for Automatic Speech Recognition , 2014 .

[20]  C. H. Sherman,et al.  Spatial‐Correlation Functions for Various Noise Models , 1962 .

[21]  Ning Ma,et al.  The CHiME corpus: a resource and a challenge for computational hearing in multisource environments , 2010, INTERSPEECH.

[22]  Jont B. Allen,et al.  Multimicrophone signal‐processing technique to remove room reverberation from speech signals , 1977 .

[23]  Gary W. Elko,et al.  ROOM IMPULSE RESPONSE VARIATION DUE TO THERMAL FLUCTUATION AND ITS IMPACT ON ACOUSTIC ECHO CANCELLATION , 2003 .

[24]  R. Maas,et al.  Towards a Better Understanding of the Effect of Reverberation on Speech Recognition Performance , 2010 .

[25]  A. V. Vasilakos,et al.  Mobile Radio Communications, 2nd ed. , 2001 .

[26]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[27]  Thomas Esch,et al.  Model-Based Dereverberation Preserving Binaural Cues , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Walter Kellermann,et al.  Unbiased coherent-to-diffuse ratio estimation for dereverberation , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[29]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[30]  Christophe Beaugeant,et al.  Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals , 2011, 2011 19th European Signal Processing Conference.

[31]  P. Peterson Simulating the response of multiple microphones to a single acoustic source in a reverberant room. , 1986, The Journal of the Acoustical Society of America.

[32]  Roland Maas,et al.  Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Hervé Bourlard,et al.  Microphone array post-filter based on noise field coherence , 2003, IEEE Trans. Speech Audio Process..

[34]  Heinrich Kuttruff,et al.  Room acoustics , 1973 .

[35]  Eap Emanuël Habets Single- and multi-microphone speech dereverberation using spectral enhancement , 2007 .

[36]  Gérard Faucon,et al.  Using the coherence function for noise reduction , 1992 .

[37]  Emanuel A. P. Habets,et al.  A study on speech quality and speech intelligibility measures for quality assessment of single-channel dereverberation algorithms , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[38]  Régine Le Bouquin-Jeannès,et al.  Enhancement of speech degraded by coherent and incoherent noise using a cross-spectral estimator , 1997, IEEE Trans. Speech Audio Process..

[39]  Stephan Weiss,et al.  Design of near perfect reconstruction oversampled filter banks for subband adaptive filters , 1999 .

[40]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[41]  F. Jacobsen,et al.  The coherence of reverberant sound fields. , 2000, The Journal of the Acoustical Society of America.

[42]  R. K. Cook,et al.  Measurement of Correlation Coefficients in Reverberant Sound Fields , 1955 .

[43]  Emanuel A. P. Habets,et al.  Late Reverberant Spectral Variance Estimation Based on a Statistical Model , 2009, IEEE Signal Processing Letters.

[44]  Maja Taseska,et al.  The diffuse sound field in energetic analysis. , 2012, The Journal of the Acoustical Society of America.

[45]  P. Jeffrey Bloom,et al.  Evaluation of two-input speech dereverberation techniques , 1982, ICASSP.

[46]  O. Thiergart,et al.  Coherence-based diffuseness estimation in the spherical harmonic domain , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[47]  Ville Pulkki,et al.  Spatial Sound Reproduction with Directional Audio Coding , 2007 .

[48]  Emanuel A. P. Habets,et al.  Signal-to-reverberant ratio estimation based on the complex spatial coherence between omnidirectional microphones , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[49]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[50]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[51]  R. Zelinski,et al.  A microphone array with adaptive post-filtering for noise reduction in reverberant rooms , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.