Mapping Mel sub-band energies using Deep belief network for robust speech recognition

Sub-band speech processing is well-known in robust speech recognition. On the other hand, in recent years, deep neural networks (DNNs) have been widely used in speech recognition for acoustic modeling and also feature extraction and transformation. In this paper, we propose to use deep belief network (DBN) as a post-processing method for de-noising in Mel sub-band level where we enhance logarithm of Mel filter bank energies (LMFBs). In addition, we use DBN for extracting tandem features (posterior probability of phones occurrence) from de-noised LMFBs (obtained from previous stage) to obtain more robust and discriminative features. The final robust feature vector consists of de-noised LMFBs concatenated to mentioned tandem features. Evaluation results on Aurora2 database show that the proposed feature vector performs better than similar and conventional techniques, where it increases recognition accuracy in average by 55.12% in comparison to LMFBs.

[1]  James R. Glass,et al.  Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Hervé Bourlard,et al.  Phase autocorrelation (PAC) derived robust speech features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Ahmad Akbari,et al.  SNR-dependent compression of enhanced Mel sub-band energies for compensation of noise effects on MFCC features , 2007, Pattern Recognit. Lett..

[4]  Jun Du,et al.  Robust speech recognition with speech enhanced deep neural networks , 2014, INTERSPEECH.

[5]  DeLiang Wang,et al.  Deep neural network based spectral feature mapping for robust speech recognition , 2015, INTERSPEECH.

[6]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[7]  Mohammad Mehdi Homayounpour,et al.  A brief survey on deep belief networks and introducing a new object oriented MATLAB toolbox (DeeBNet) , 2014, ArXiv.

[8]  Xin Zheng,et al.  Investigation of tandem deep belief network approach for phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[10]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Hynek Hermansky,et al.  A long, deep and wide artificial neural net for robust speech recognition in unknown noise , 2014, INTERSPEECH.

[12]  G. K.,et al.  Learning Spectral Mapping for Speech Dereverberation and Denoising , 2017 .

[13]  Oriol Vinyals,et al.  Comparing multilayer perceptron to Deep Belief Network Tandem features for robust ASR , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[15]  Nelson Morgan,et al.  Deep and Wide: Multiple Layers in Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Jeff A. Bilmes,et al.  MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Ahmad Akbari,et al.  Mel sub-band filtering and compression for robust speech recognition , 2007, INTERSPEECH.

[18]  Jinyu Li,et al.  Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks. , 2013, ICLR 2013.

[19]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[20]  Yifan Gong,et al.  An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[23]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[24]  Yongqiang Wang,et al.  An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Steven Wegmann,et al.  On the importance of modeling and robustness for deep neural network feature , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).