Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing

We present an improved method for training Deep Neural Networks for dereverberation and show that it can improve performance for the speech processing tasks of speaker verification and speech enhancement. We replicate recently proposed methods for dereverberation using Deep Neural Networks and present our improved method, highlighting important aspects that influence performance. We then experimentally evaluate the capabilities and limitations of the method with respect to speech quality and speaker verification to show that ours achieves better performance than other proposed methods.

[1]  Tara N. Sainath,et al.  Learning filter banks within a deep neural network framework , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[2]  Tatsuya Kawahara,et al.  Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature , 2015, EURASIP J. Adv. Signal Process..

[3]  Biing-Hwang Juang,et al.  Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[5]  DeLiang Wang,et al.  Learning spectral mapping for speech dereverberation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  DeLiang Wang,et al.  A two-stage algorithm for one-microphone reverberant speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[8]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Richard M. Stern,et al.  Robust speech recognition using temporal masking and thresholding algorithm , 2014, INTERSPEECH.

[10]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[11]  Boaz Rafaely,et al.  Reverberation matching for speaker recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  E. Lehmann,et al.  Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.

[13]  Chin-Hui Lee,et al.  A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Emanuel A. P. Habets,et al.  Subjective speech quality and speech intelligibility evaluation of single-channel dereverberation algorithms , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[15]  Peter F. Assmann,et al.  The Perception of Speech Under Adverse Conditions , 2004 .

[16]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[17]  John H. L. Hansen,et al.  Blind Spectral Weighting for Robust Speaker Identification under Reverberation Mismatch , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  H. Sheikhzadeh,et al.  Single-Microphone LP Residual Skewness-Based Inverse Filtering of the Room Impulse Response , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Simon Doclo,et al.  Speech dereverberation using weighted prediction error with Laplacian model of the desired signal , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Chin-Hui Lee,et al.  A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[21]  Geoffrey Zweig,et al.  An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.

[22]  G. K.,et al.  Learning Spectral Mapping for Speech Dereverberation and Denoising , 2017 .