An optimization method for speech enhancement based on deep neural network

Now, this document puts forward a deep neural network (DNN) model with more credible data set and more robust structure. First, we take two regularization skills, dropout and sparsity constraint to strengthen the generalization ability of the model. In this way, not only the model is able to reach the consistency between the pre-training model and the fine-tuning model, but also it reduce resource consumption. Then network compression by weights sharing and quantization is allowed to reduce storage cost. In the end, we evaluate the quality of the reconstructed speech according to different criterion. The result proofs that the improved framework has good performance on speech enhancement and meets the requirement of speech processing.

[1]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[3]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[4]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[7]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  K. Edet Bijoy,et al.  Performance evaluation of single channel speech separation using non-negative matrix factorization , 2014, 2014 IEEE National Conference on Communication, Signal Processing and Networking (NCCSN).

[9]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[10]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[11]  Jun Du,et al.  An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[12]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Jian-Huang Lai,et al.  Deep Representations Based on Sparse Auto-Encoder Networks for Face Spoofing Detection , 2016, CCBR.

[14]  J. S. Mason,et al.  Artificial neural networks for nonlinear time-domain filtering of speech , 1996 .

[15]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[16]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[17]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.