A DNN parameter mask for the binaural reverberant speech segregation

The reverberant speech segregation is a basic problem in speech enhancement and automatic speech recognition. Based on the deep neural networks (DNN), a novel binaural speech segregation method is proposed. The binaural feature is extracted and used as the cue to train a DNN with a ideal parameter mask. The trained DNN is used to distinguish the target speech and noise, and output the estimated parameter mask. The performance of the proposed method is systematic evaluated, and it is better than the binary mask method. The proposed method has good performance on untrained locations and reverberant conditions too.

[1]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[2]  Yi Jiang,et al.  Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Yi Jiang,et al.  Binaural deep neural network classification for reverberant speech segregation , 2014, INTERSPEECH.

[4]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[5]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[6]  WangDeLiang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013 .

[7]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[8]  Dongmei Li,et al.  A realtime analysis/synthesis Gammatone filterbank , 2015, 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC).

[9]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[10]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  D. R. Campbell,et al.  A MATLAB Simulation of “ Shoebox ” Room Acoustics for use in Research and Teaching , 2022 .

[14]  DeLiang Wang,et al.  Complex Ratio Masking for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.