Joint Estimation of Reverberation Time and Direct-to-Reverberation Ratio from Speech using Auditory-Inspired Features

Blind estimation of acoustic room parameters such as the reverberation time $T_\mathrm{60}$ and the direct-to-reverberation ratio ($\mathrm{DRR}$) is still a challenging task, especially in case of blind estimation from reverberant speech signals. In this work, a novel approach is proposed for joint estimation of $T_\mathrm{60}$ and $\mathrm{DRR}$ from wideband speech in noisy conditions. 2D Gabor filters arranged in a filterbank are exploited for extracting features, which are then used as input to a multi-layer perceptron (MLP). The MLP output neurons correspond to specific pairs of $(T_\mathrm{60}, \mathrm{DRR})$ estimates; the output is integrated over time, and a simple decision rule results in our estimate. The approach is applied to single-microphone fullband speech signals provided by the Acoustic Characterization of Environments (ACE) Challenge. Our approach outperforms the baseline systems with median errors of close-to-zero and -1.5 dB for the $T_\mathrm{60}$ and $\mathrm{DRR}$ estimates, respectively, while the calculation of estimates is 5.8 times faster compared to the baseline.

[1]  Emanuel A. P. Habets,et al.  Blind estimation of reverberation time based on the distribution of signal decay rates , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Martin Cooke,et al.  BINAURAL DISTANCE PERCEPTION BASED ON DIRECT-TO-REVERBERANT ENERGY RATIO , 2008 .

[3]  Tomohiro Nakatani,et al.  The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[4]  Peter Vary,et al.  A binaural room impulse response database for the evaluation of dereverberation algorithms , 2009, 2009 16th International Conference on Digital Signal Processing.

[5]  Peter Vary,et al.  An Improved Algorithm for Blind Reverberation Time Estimation , 2010 .

[6]  Matti Karjalainen,et al.  Estimation of Modal Decay Parameters from Noisy Response Measurements , 2002 .

[7]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[8]  Y. Haneda,et al.  Estimating Direct-to-Reverberant Energy Ratio Using D/R Spatial Correlation Matrix Model , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Tiago H. Falk,et al.  Temporal Dynamics for Blind Measurement of Room Acoustical Parameters , 2010, IEEE Transactions on Instrumentation and Measurement.

[10]  Stefan Goetze,et al.  Blind estimation of reverberation time based on spectro-temporal modulation filtering , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  J. Foote,et al.  WSJCAM0: A BRITISH ENGLISH SPEECH CORPUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION , 1995 .

[12]  B. Kollmeier,et al.  Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. , 2012, The Journal of the Acoustical Society of America.

[13]  Alastair H. Moore,et al.  Direct-to-Reverberant Ratio estimation using a null-steered beamformer , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  H. Sheikhzadeh,et al.  Single-Microphone LP Residual Skewness-Based Inverse Filtering of the Room Impulse Response , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Marc René Schädler,et al.  Comparing Different Flavors of Spectro-Temporal Features for ASR , 2011, INTERSPEECH.

[16]  H. Sabine Room Acoustics , 1953, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[17]  M. Schroeder New Method of Measuring Reverberation Time , 1965 .

[18]  C. Schreiner,et al.  Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. , 2003, Journal of neurophysiology.

[19]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[21]  Douglas L. Jones,et al.  Blind estimation of reverberation time. , 2003, The Journal of the Acoustical Society of America.

[22]  Patrick A. Naylor,et al.  Noise-robust reverberation time estimation using spectral decay distributions with reduced computational cost , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Patrick A. Naylor,et al.  EVALUATION OF SPEECH DEREVERBERATION ALGORITHMS USING THE MARDY DATABASE , 2006 .

[24]  Patrick A. Naylor,et al.  Speech Dereverberation , 2010 .

[25]  Yonggang Zhang,et al.  Monaural room acoustic parameters from music and speech. , 2008, The Journal of the Acoustical Society of America.

[26]  Martin Kuster Estimating the direct-to-reverberant energy ratio from the coherence between coincident pressure and particle velocity. , 2011, The Journal of the Acoustical Society of America.

[27]  Armin Sehr,et al.  Reverberation Modeling for Robust Distant-Talking Speech Recognition , 2010 .

[28]  Søren Holdt Jensen,et al.  The single- and multichannel audio recordings database (SMARD) , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[29]  Francis F. Li,et al.  Extracting Room Reverberation Time from Speech Using Artificial Neural Networks , 2001 .

[30]  Stefan Goetze,et al.  Estimating room acoustic parameters for speech recognizer adaptation and combination in reverberant environments , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Alastair H. Moore,et al.  The ACE challenge — Corpus description and performance evaluation , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).