Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm

Reverberation, which is generally caused by sound reflections from walls, ceilings, and floors, can result in severe performance degradation of acoustic applications. Due to a complicated combination of attenuation and time-delay effects, the reverberation property is difficult to characterize, and it remains a challenging task to effectively retrieve the anechoic speech signals from reverberation ones. In the present study, we proposed a novel integrated deep and ensemble learning algorithm (IDEA) for speech dereverberation. The IDEA consists of offline and online phases. In the offline phase, we train multiple dereverberation models, each aiming to precisely dereverb speech signals in a particular acoustic environment; then a unified fusion function is estimated that aims to integrate the information of multiple dereverberation models. In the online phase, an input utterance is first processed by each of the dereverberation models. The outputs of all models are integrated accordingly to generate the final anechoic signal. We evaluated the IDEA on designed acoustic environments, including both matched and mismatched conditions of the training and testing data. Experimental results confirm that the proposed IDEA outperforms single deep-neural-network-based dereverberation model with the same model architecture and training data.

[1]  DeLiang Wang,et al.  A two-stage algorithm for one-microphone reverberant speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Scott C. Douglas,et al.  Convolutive blind separation of speech mixtures using the natural gradient , 2003, Speech Commun..

[3]  John H. L. Hansen,et al.  Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Kostas Kokkinakis,et al.  A channel-selection criterion for suppressing reverberation in cochlear implants. , 2011, The Journal of the Acoustical Society of America.

[5]  Nicoleta Roman,et al.  Speech intelligibility in reverberation with ideal binary masking: effects of early reflections and signal-to-noise ratio threshold. , 2013, The Journal of the Acoustical Society of America.

[6]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Peter Kabal,et al.  Reverberant speech enhancement using cepstral processing , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Ina Kodrasi,et al.  Frequency-domain single-channel inverse filtering for speech dereverberation: Theory and practice , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Yu Tsao,et al.  Ensemble environment modeling using affine transform group , 2015, Speech Commun..

[10]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[11]  Preeti Rao,et al.  Speech dereverberation using NMF with regularized room impulse response , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Tomohiro Nakatani,et al.  The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[13]  Chin-Hui Lee,et al.  A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Tao Zhang,et al.  Learning Spectral Mapping for Speech Dereverberation and Denoising , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[16]  Yu Tsao,et al.  Ensemble modeling of denoising autoencoder for speech spectrum restoration , 2014, INTERSPEECH.

[17]  DeLiang Wang,et al.  Robust Speaker Identification in Noisy and Reverberant Conditions , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Haizhou Li,et al.  Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation , 2016, EURASIP J. Adv. Signal Process..

[19]  Yonghong Yan,et al.  Speech intelligibility enhancement in noisy reverberant conditions , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[20]  Hirokazu Kameoka,et al.  Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Yu Tsao,et al.  An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Marc Delcroix,et al.  Inverse Filtering for Speech Dereverberation Less Sensitive to Noise and Room Transfer Function Fluctuations , 2007, EURASIP J. Adv. Signal Process..

[23]  Biing-Hwang Juang,et al.  Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Henrique S. Malvar,et al.  Speech dereverberation via maximum-kurtosis subband adaptive filtering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[25]  DeLiang Wang,et al.  Learning spectral mapping for speech dereverberation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Sha Liu,et al.  Development of the Mandarin Hearing in Noise Test (MHINT) , 2007, Ear and hearing.

[27]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[28]  Jacob Benesty,et al.  Speech Acquisition and Enhancement in a Reverberant, Cocktail-Party-Like Environment , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[29]  Yu Tsao,et al.  SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement , 2016, INTERSPEECH.

[30]  Patrick A. Naylor,et al.  Speech Dereverberation , 2010 .

[31]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[32]  James R. Glass,et al.  Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).