Local Critic Training of Deep Neural Networks

This paper proposes a novel approach to train deep neural networks by unlocking the layer-wise dependency of backpropagation training. The approach employs additional modules called local critic networks besides the main network model to be trained, which are used to obtain error gradients without complete feedforward and backward propagation processes. We propose a cascaded learning strategy for these local networks. In addition, the approach is also useful from multi-model perspectives, including structural optimization of neural networks, computationally efficient progressive inference, and ensemble classification for performance improvement. Experimental results show the effectiveness of the proposed approach and suggest guidelines for determining appropriate algorithm parameters.

[1]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[2]  James T. Kwok,et al.  Constructive algorithms for structure learning in feedforward neural networks for regression problems , 1997, IEEE Trans. Neural Networks.

[3]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[4]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[5]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[6]  Trevor Darrell,et al.  Learning the Structure of Deep Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[9]  Miguel Á. Carreira-Perpiñán,et al.  Distributed optimization of deeply nested systems , 2012, AISTATS.

[10]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[11]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[12]  Mehryar Mohri,et al.  AdaNet: Adaptive Structural Learning of Artificial Neural Networks , 2016, ICML.

[13]  Zheng Xu,et al.  Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[14]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Razvan Pascanu,et al.  Sobolev Training for Neural Networks , 2017, NIPS.