论文信息 - Precision Scaling of Neural Networks for Efficient Audio Processing

Precision Scaling of Neural Networks for Efficient Audio Processing

While deep neural networks have shown powerful performance in many audio applications, their large computation and memory demand has been a challenge for real-time processing. In this paper, we study the impact of scaling the precision of neural networks on the performance of two common audio processing tasks, namely, voice-activity detection and single-channel speech enhancement. We determine the optimal pair of weight/neuron bit precision by exploring its impact on both the performance and processing time. Through experiments conducted with real user data, we demonstrate that deep neural networks that use lower bit precision significantly reduce the processing time (up to 30x). However, their performance impact is low (< 3.14%) only in the case of classification tasks such as those present in voice activity detection.

Matthai Philipose | Ivan Tashev | Jong Hwan Ko | Shuayb Zarar | Josh Fromm

[1] Xiao-Lei Zhang,et al. Deep Belief Networks Based Voice Activity Detection , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Gang Hua,et al. How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.

[3] Saibal Mukhopadhyay,et al. Adaptive weight compression for memory-efficient neural networks , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[4] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[5] Seyedmahdad Mirsamadi,et al. DNN-based Causal Voice Activity Detector , 2017 .

[6] Ivan Tashev,et al. Unified framework for single channel speech enhancement , 2009, 2009 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[7] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[8] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] DeLiang Wang,et al. Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.

[11] Saibal Mukhopadhyay,et al. Speeding up Convolutional Neural Network Training with Dynamic Precision Scaling and Flexible Multiplier-Accumulator , 2016, ISLPED.

[12] Geoffrey Zweig,et al. An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.