Ensemble Hierarchical Extreme Learning Machine for Speech Dereverberation

Data-driven deep learning solutions with gradient-based neural architecture, have proven useful in overcoming some limitations of traditional signal processing techniques. However, a large number of reverberant–anechoic training utterance pairs covering as many environmental conditions as possible is required to achieve robust dereverberation performance in unseen testing conditions. In this article, we propose to address the data requirement issue while preserving the advantages of deep neural structures leveraging upon hierarchical extreme learning machines (HELMs), which are not gradient-based neural architectures. In particular, an ensemble HELM learning framework is established to effectively recover anechoic speech from a reverberant one based on spectral mapping. In addition to the ensemble learning framework, we further derive two novel HELM models, namely, highway HELM [HELM(Hwy)] and residual HELM [HELM(Res)], both incorporating low-level features to enrich the information for spectral mapping. We evaluated the proposed ensemble learning framework using simulated and measured impulse responses by employing Texas Instrument and Massachusetts Institute of Technology (TIMIT), Mandarin hearing in noise test (MHINT), and reverberant voice enhancement and recognition benchmark (REVERB) corpora. The experimental results show that the proposed framework outperforms both traditional methods and a recently proposed integrated deep and ensemble learning algorithm in terms of standardized objective and subjective evaluations under matched and mismatched testing conditions for simulated and measured impulse responses.

[1]  Ea-Ee Jan,et al.  Spatially selective sound capture for speech and audio processing , 1993, Speech Commun..

[2]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[3]  Q. M. Jonathan Wu,et al.  Human action recognition using extreme learning machine based on visual vocabularies , 2010, Neurocomputing.

[4]  Guang-Bin Huang,et al.  Extreme Learning Machine for Multilayer Perceptron , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Henrique S. Malvar,et al.  Speech dereverberation via maximum-kurtosis subband adaptive filtering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Tomohiro Nakatani,et al.  The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[7]  Sha Liu,et al.  Development of the Mandarin Hearing in Noise Test (MHINT) , 2007, Ear and hearing.

[8]  Yu Tsao,et al.  Experimental Study on Extreme Learning Machine Applications for Speech Enhancement , 2017, IEEE Access.

[9]  Tao Zhang,et al.  Learning Spectral Mapping for Speech Dereverberation and Denoising , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[11]  Yu Tsao,et al.  Ensemble modeling of denoising autoencoder for speech spectrum restoration , 2014, INTERSPEECH.

[12]  Li Deng,et al.  Speech Denoising and Dereverberation Using Probabilistic Models , 2000, NIPS.

[13]  DeLiang Wang,et al.  Robust Speaker Identification in Noisy and Reverberant Conditions , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Kostas Kokkinakis,et al.  A channel-selection criterion for suppressing reverberation in cochlear implants. , 2011, The Journal of the Acoustical Society of America.

[15]  DeLiang Wang,et al.  A two-stage algorithm for one-microphone reverberant speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[17]  Francis F. Li,et al.  Extracting Room Reverberation Time from Speech Using Artificial Neural Networks , 2001 .

[18]  Walter Kellermann,et al.  Coherent-to-Diffuse Power Ratio Estimation for Dereverberation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Yu Tsao,et al.  Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Haizhou Li,et al.  Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation , 2016, EURASIP J. Adv. Signal Process..

[21]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[23]  Jen-Tzung Chien,et al.  Bayesian learning for speech dereverberation , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[24]  DeLiang Wang,et al.  Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  James R. Glass,et al.  Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  J.-M. Boucher,et al.  A New Method Based on Spectral Subtraction for Speech Dereverberation , 2001 .

[28]  Tomohiro Nakatani,et al.  Single-Microphone Blind Dereverberation , 2005 .

[29]  Roland Maas,et al.  Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Jason Jianjun Gu,et al.  An Efficient Method for Traffic Sign Recognition Based on Extreme Learning Machine , 2017, IEEE Transactions on Cybernetics.

[31]  Meng Joo Er,et al.  Parsimonious Extreme Learning Machine Using Recursive Orthogonal Least Squares , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Biing-Hwang Juang,et al.  Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Sabato Marco Siniscalchi,et al.  Adaptation to New Microphones Using Artificial Neural Networks With Trainable Activation Functions , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Yuan Lan,et al.  An extreme learning machine approach for speaker recognition , 2012, Neural Computing and Applications.

[35]  Tanja Schultz,et al.  Far-Field Speaker Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Peter Kabal,et al.  Reverberant speech enhancement using cepstral processing , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Calyampudi R. Rao,et al.  Generalized inverse of a matrix and its applications , 1972 .

[39]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[40]  Peter Vary,et al.  A binaural room impulse response database for the evaluation of dereverberation algorithms , 2009, 2009 16th International Conference on Digital Signal Processing.

[41]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[42]  Li-Rong Dai,et al.  A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[43]  Björn W. Schuller,et al.  Channel mapping using bidirectional long short-term memory for dereverberation in hands-free voice controlled devices , 2014, IEEE Transactions on Consumer Electronics.

[44]  John H. L. Hansen,et al.  Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45]  Tomohiro Nakatani,et al.  Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  David V. Anderson,et al.  A framework for speech enhancement using extreme learning machines , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[47]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[48]  Yasuo Horiuchi,et al.  Reverberant speech recognition based on denoising autoencoder , 2013, INTERSPEECH.

[49]  Chin-Hui Lee,et al.  A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[50]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  Seyed Omid Sadjadi,et al.  Simultaneous suppression of noise and reverberation in cochlear implants using a ratio masking strategy. , 2013, The Journal of the Acoustical Society of America.

[52]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[53]  Yuuki Tachioka,et al.  The MERL/MELCO/TUM system for the REVERB Challenge using Deep Recurrent Neural Network Feature Enhancement , 2014, ICASSP 2014.