Deep Hidden Analysis: A Statistical Framework to Prune Feature Maps

In this paper, we propose a statistical framework to prune feature maps in 1-D deep convolutional networks. SoundNet is a pre-trained deep convolutional network that accepts raw audio samples as input. The feature maps generated at various layers of SoundNet have redundancy, which can be identified by statistical analysis. These redundant feature maps can be pruned from the network with a very minor reduction in the capability of the network. The advantage of pruning feature maps, is that computational complexity can be reduced in the context of using an ensemble of classifiers on the layers of SoundNet. Our experiments on acoustic scene classification demonstrate that ignoring 89% of feature maps reduces the performance by less than 3% with 18% reduction in computational complexity.

[1]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[2]  Lior Wolf,et al.  Channel-Level Acceleration of Deep Face Representations , 2015, IEEE Access.

[3]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  W. Penny,et al.  Analysis of variance , 2007 .

[5]  Alain Rakotomamonjy,et al.  Histogram of gradients of Time-Frequency Representations for Audio scene detection , 2015, ArXiv.

[6]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[7]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Loris Nanni,et al.  Combining visual and acoustic features for audio classification tasks , 2017, Pattern Recognit. Lett..

[10]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Tuomas Virtanen,et al.  TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[12]  Arnav Bhavsar,et al.  A Layer-wise Score Level Ensemble Framework for Acoustic Scene Classification , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[13]  Jinjun Xiong,et al.  The Excitement of Sports: Automatic Highlights Using Audio/Visual Cues , 2018, CVPR Workshops.

[14]  Scott Workman,et al.  A Multimodal Approach to Mapping Soundscapes , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[15]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[17]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jiwen Lu,et al.  Runtime Neural Pruning , 2017, NIPS.

[19]  Fernando Pérez-Cruz,et al.  Estimation of Information Theoretic Measures for Continuous Random Variables , 2008, NIPS.

[20]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Karol J. Piczak ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.

[22]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[23]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.