论文信息 - Convolutional Networks with Adaptive Computation Graphs

Convolutional Networks with Adaptive Computation Graphs

Do convolutional networks really need a fixed feed-forward structure? Often, a neural network is already confident after a few layers about the high-level concept shown in the image. However, due to the fixed network structure, all remaining layers still need to be evaluated. What if the network could jump right to a layer that is specialized in fine-grained differences of the image's content? In this work, we propose Adanets, a family of convolutional networks with adaptive computation graphs. Following a high-level structure similar to residual networks (Resnets), the key difference is that for each layer a gating function determines whether to execute the layer or move on to the next one. In experiments on CIFAR-10 and ImageNet we demonstrate that Adanets efficiently allocate computational budget among layers and learn distinct layers specializing in similar categories. Adanet 50 achieves a top 5 error rate of 7.94% on ImageNet using 30% fewer computations than Resnet 34, which only achieves 8.58%. Lastly, we study the effect of adaptive computation graphs on the susceptibility towards adversarial examples. We observe that Adanets show a higher robustness towards adversarial attacks, complementing other defenses such as JPEG compression.

Serge J. Belongie | Andreas Veit

[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[2] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Li Zhang,et al. Spatially Adaptive Computation Time for Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[5] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Kilian Q. Weinberger,et al. Multi-Scale Dense Convolutional Networks for Efficient Prediction , 2017, ArXiv.

[7] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8] Fan Yang,et al. Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10] Jiaying Liu,et al. Demystifying Neural Style Transfer , 2017, IJCAI.

[11] Serge J. Belongie,et al. Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[12] Gang Hua,et al. A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15] H. T. Kung,et al. BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[16] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[18] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[19] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[20] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.

[21] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[22] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[24] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Jürgen Schmidhuber,et al. Highway Networks , 2015, ArXiv.

[27] Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[29] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30] Joelle Pineau,et al. Conditional Computation in Neural Networks for faster models , 2015, ArXiv.

[31] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[32] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[34] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[35] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[36] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[37] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[38] Martial Hebert,et al. From Red Wine to Red Tomato: Composition with Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] E. Gumbel. Statistical Theory of Extreme Values and Some Practical Applications : A Series of Lectures , 1954 .

[42] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.