Distilled Binary Neural Network for Monaural Speech Separation

Monaural speech separation, aiming at solving the cocktail party problem, has many important application scenarios, most of which ask for the real-time response, high energy efficiency and efficient storage. However, the state-of-the-art Deep Neural Network based separation models usually require huge memory and computation for the 32-bit floating point multiply accumulations, hence most of them cannot meet those requirements. Recently, there are many methods proposed to solve the problem, and binary neural networks have drawn many attentions for they compress and speed up its counterparts at the cost of some performance. Hence, in this paper, we binarize Deep Neural Network based separation models, aiming to deploy them on embedded devices for real-time applications. Furthermore, we improve the separation performance by integrating knowledge distillation into the training phase of binary neural network based models, which is referred as Distilled Binary Neural Network (DBNN). To the best of our knowledge, DBNN is the first attempt to integrate two types of model compression. In the experiments, we demonstrate the effectiveness of our proposed method, which successfully binarizes the Deep Neural Network based separation models with a comparable performance.

[1]  Dahua Lin,et al.  Adjustable Bounded Rectifiers: Towards Deep Binary Representations , 2015, ArXiv.

[2]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[3]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[4]  Jun Du,et al.  Speech separation of a target speaker based on deep neural networks , 2014, 2014 12th International Conference on Signal Processing (ICSP).

[5]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[6]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Paris Smaragdis,et al.  Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  DeLiang Wang,et al.  Deep learning reinvents the hearing aid , 2017, IEEE Spectrum.

[9]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[10]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  DeLiang Wang,et al.  Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Jun Du,et al.  A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation , 2017, INTERSPEECH.

[15]  Bo Xu,et al.  Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment , 2018, AAAI.

[16]  Barry Arons,et al.  A Review of The Cocktail Party Effect , 1992 .

[17]  Jesper Jensen,et al.  Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Medhat A. Moussa,et al.  Attacking Binarized Neural Networks , 2017, ICLR.

[19]  DeLiang Wang,et al.  On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Zhuo Chen,et al.  Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Nima Mesgarani,et al.  Deep attractor network for single-microphone speaker separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[23]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[24]  Jen-Tzung Chien,et al.  Discriminative deep recurrent neural networks for monaural speech separation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[26]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[27]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[30]  Gang Hua,et al.  How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.

[31]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[32]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[33]  Shih-Chii Liu,et al.  Impact of low-precision deep regression networks on single-channel source separation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[35]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[36]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[37]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.