Joint Estimation of Reverberation Time and Early-To-Late Reverberation Ratio From Single-Channel Speech Signals

The reverberation time (RT) and the early-to-late reverberation ratio (ELR) are two key parameters commonly used to characterize acoustic room environments. In contrast to conventional blind estimation methods that process the two parameters separately, we propose a model for joint estimation to predict the RT and the ELR simultaneously from single-channel speech signals from either full-band or sub-band frequency data, which is referred to as joint room parameter estimator (jROPE). An artificial neural network is employed to learn the mapping from acoustic observations to the RT and the ELR classes. Auditory-inspired acoustic features obtained by temporal modulation filtering of the speech time-frequency representations are used as input for the neural network. Based on an in-depth analysis of the dependency between the RT and the ELR, a two-dimensional (RT, ELR) distribution with constrained boundaries is derived, which is then exploited to evaluate four different configurations for jROPE. Experimental results show that—in comparison to the single-task ROPE system which individually estimates the RT or the ELR—jROPE provides improved results for both tasks in various reverberant and (diffuse) noisy environments. Among the four proposed joint types, the one incorporating multi-task learning with shared input and hidden layers yields the best estimation accuracies on average. When encountering extreme reverberant conditions with RTs and ELRs lying beyond the derived (RT, ELR) distribution, the type considering RT and ELR as a joint parameter performs robustly, in particular. From state-of-the-art algorithms that were tested in the acoustic characterization of environments challenge, jROPE achieves comparable results among the best for all individual tasks (RT and ELR estimation from full-band and sub-band signals).

[1]  Alastair H. Moore,et al.  Direct-to-Reverberant Ratio estimation using a null-steered beamformer , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[3]  Alessio Brutti,et al.  On the relationship between Early-to-Late Ratio of Room Impulse Responses and ASR performance in reverberant environments , 2016, Speech Commun..

[4]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[5]  Patrick A. Naylor,et al.  EVALUATION OF SPEECH DEREVERBERATION ALGORITHMS USING THE MARDY DATABASE , 2006 .

[6]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[7]  Patrick A. Naylor,et al.  Non-intrusive estimation of the level of reverberation in speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Niko Moritz,et al.  Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features , 2015, EURASIP J. Adv. Signal Process..

[9]  Christophe Beaugeant,et al.  Do We Need Dereverberation for Hand-Held Telephony? , 2010 .

[10]  Stefan Goetze,et al.  Joint Estimation of Reverberation Time and Direct-to-Reverberation Ratio from Speech using Auditory-Inspired Features , 2015, ArXiv.

[11]  Y. Haneda,et al.  Estimating Direct-to-Reverberant Energy Ratio Using D/R Spatial Correlation Matrix Model , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[13]  Ton Kalker,et al.  A blind algorithm for reverberation-time estimation using subband decomposition of speech signals. , 2012, The Journal of the Acoustical Society of America.

[14]  Tiago H. Falk,et al.  Temporal Dynamics for Blind Measurement of Room Acoustical Parameters , 2010, IEEE Transactions on Instrumentation and Measurement.

[15]  Prasanga N. Samarasinghe,et al.  Estimation of the direct-to-reverberant Energy Ratio using a spherical microphone array , 2015, ArXiv.

[16]  Alastair H. Moore,et al.  Estimation of Room Acoustic Parameters: The ACE Challenge , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  J. Foote,et al.  WSJCAM0: A BRITISH ENGLISH SPEECH CORPUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION , 1995 .

[18]  Oldooz Hazrati,et al.  Blind binary masking for reverberation suppression in cochlear implants. , 2013, The Journal of the Acoustical Society of America.

[19]  Birger Kollmeier,et al.  Exploring Auditory-Inspired Acoustic Features for Room Acoustic Parameter Estimation From Monaural Speech , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Martin Cooke,et al.  Binaural Estimation of Sound Source Distance via the Direct-to-Reverberant Energy Ratio for Static and Moving Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Emanuel A. P. Habets,et al.  Blind estimation of reverberation time based on the distribution of signal decay rates , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Matti Karjalainen,et al.  Estimation of Modal Decay Parameters from Noisy Response Measurements , 2002 .

[23]  Patrick A. Naylor,et al.  Evaluating the Non-Intrusive Room Acoustics Algorithm with the ACE Challenge , 2015, ArXiv.

[24]  Peter Vary,et al.  An Improved Algorithm for Blind Reverberation Time Estimation , 2010 .

[25]  R. Maas,et al.  Towards a Better Understanding of the Effect of Reverberation on Speech Recognition Performance , 2010 .

[26]  Yonggang Zhang,et al.  Monaural room acoustic parameters from music and speech. , 2008, The Journal of the Acoustical Society of America.

[27]  Martin Kuster Estimating the direct-to-reverberant energy ratio from the coherence between coincident pressure and particle velocity. , 2011, The Journal of the Acoustical Society of America.

[28]  Douglas L. Jones,et al.  Blind estimation of reverberation time. , 2003, The Journal of the Acoustical Society of America.

[29]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[30]  Emanuel A. P. Habets,et al.  Late Reverberant Spectral Variance Estimation Based on a Statistical Model , 2009, IEEE Signal Processing Letters.

[31]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[32]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[33]  Søren Holdt Jensen,et al.  The single- and multichannel audio recordings database (SMARD) , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[34]  Peter Vary,et al.  Single-Channel Maximum-Likelihood T60 Estimation Exploiting Subband Information , 2015, ArXiv.

[35]  Stefan Goetze,et al.  Estimating room acoustic parameters for speech recognizer adaptation and combination in reverberant environments , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[37]  Tomohiro Nakatani,et al.  Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition , 2012, IEEE Signal Process. Mag..

[38]  Sergio L. Netto,et al.  Blind estimators for reverberation time and direct-to-reverberant energy ratio using subband speech decomposition , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[39]  Yuki Denda,et al.  Investigations into early and late reflections on distant-talking speech recognition toward suitable reverberation criteria , 2007, INTERSPEECH.

[40]  Alastair H. Moore,et al.  Acoustic Characterization of Environments (ACE) Challenge Results Technical Report , 2016, ArXiv.

[41]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[42]  R. Maas,et al.  A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research , 2016, EURASIP Journal on Advances in Signal Processing.

[43]  Birger Kollmeier,et al.  An Auditory Inspired Amplitude Modulation Filter Bank for Robust Feature Extraction in Automatic Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[44]  Peter Vary,et al.  A binaural room impulse response database for the evaluation of dereverberation algorithms , 2009, 2009 16th International Conference on Digital Signal Processing.