Learning time-frequency mask for noisy speech enhancement using gaussian-bernoulli pre-trained deep neural networks

Speech enhancement is a very important problem in various speech processing applications. Recently, supervised speech enhancement using deep learning approaches to estimate a time-frequency mask have proved remarkable performance gain. In this paper, we have proposed time-frequency masking-based supervised speech enhancement method for improving intelligibility and quality of the noisy speech. We believe that a large performance gain can be achieved if deep neural networks (DNNs) are layer-wise pre-trained by stacking Gaussian-Bernoulli Restricted Boltzmann Machine (GB-RBM). The proposed DNN is called as Gaussian-Bernoulli Deep Belief Network (GB-DBN) and are optimized by minimizing errors between the estimated and pre-defined masks. Non-linear Mel-Scale weighted mean square error (LMW-MSE) loss function is used as training criterion. We have examined the performance of the proposed pre-training scheme using different DNNs which are established on three time-frequency masks comprised of the ideal amplitude mask (IAM), ideal ratio mask (IRM), and phase sensitive mask (PSM). The results in different noisy conditions demonstrated that when DNNs are pre-trained by the proposed scheme provided a persistent performance gain in terms of the perceived speech intelligibility and quality. Also, the proposed pre-training scheme is effective and robust in noisy training data.

[1]  Yu Tsao,et al.  Bone-conducted speech enhancement using deep denoising autoencoder , 2018, Speech Commun..

[2]  Hanwook Chung,et al.  Training and compensation of class-conditioned NMF bases for speech enhancement , 2018, Neurocomputing.

[3]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[4]  Meng Sun,et al.  Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).