Anytime Neural Network: a Versatile Trade-off Between Computation and Accuracy

Anytime predictors first produce crude results quickly, and then continuously refine them until the test-time computational budget is depleted. Such predictors are used in real-time vision systems and streaming-data processing to efficiently utilize varying test-time budgets, and to reduce average prediction cost via early-exits. However, anytime prediction algorithms have difficulties utilizing the accurate predictions of deep neural networks (DNNs), because DNNs are often computationally expensive without competitive intermediate results. In this work, we propose to add auxiliary predictions in DNNs to generate anytime predictions, and optimize these predictions simultaneously by minimizing a carefully constructed weighted sum of losses, where the weights also oscillate during training. The proposed anytime neural networks (ANNs) produce reasonable anytime predictions without sacrificing the final performance or incurring noticeable extra computation. This enables us to assemble a sequence of exponentially deepening ANNs, and it achieves, both theoretically and practically, near-optimal anytime predictions at every budget after spending a constant fraction of extra cost. The proposed methods are shown to produce anytime predictions at the state-of-the-art level on visual recognition data-sets, including ILSVRC2012.

[1]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[2]  Mark S. Boddy,et al.  Solving Time-Dependent Planning Problems , 1989, IJCAI.

[3]  Lin Sun,et al.  Feedback Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Shlomo Zilberstein,et al.  Anytime algorithm development tools , 1996, SGAR.

[5]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[6]  Eric Horvitz,et al.  Reasoning about beliefs and actions under computational resource constraints , 1987, Int. J. Approx. Reason..

[7]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[8]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[9]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[12]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Matt J. Kusner,et al.  Classifier cascades and trees for minimizing feature evaluation cost , 2014, J. Mach. Learn. Res..

[15]  Nuno Vasconcelos,et al.  Learning Complexity-Aware Cascades for Deep Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[17]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[18]  Venkatesh Saligrama,et al.  Dynamic Model Selection for Prediction Under a Budget , 2017, ArXiv.

[19]  Geoffrey I. Webb,et al.  Classifying under computational resource constraints: anytime classification using probabilistic estimators , 2007, Machine Learning.

[20]  Kilian Q. Weinberger,et al.  The Greedy Miser: Learning under Test-time Budgets , 2012, ICML.

[21]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[24]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[25]  Junmo Kim,et al.  Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[27]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[28]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[31]  Shlomo Zilberstein,et al.  Using Anytime Algorithms in Intelligent Systems , 1996, AI Mag..

[32]  Yonatan Wexler,et al.  Minimizing the Maximal Loss: How and Why , 2016, ICML.

[33]  Trevor Darrell,et al.  Timely Object Recognition , 2012, NIPS.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[36]  Serge J. Belongie,et al.  Convolutional Networks with Adaptive Computation Graphs , 2017, ArXiv.

[37]  Xin Wang,et al.  SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[38]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[39]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[40]  Matt J. Kusner,et al.  Anytime Representation Learning , 2013, ICML.

[41]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[42]  Venkatesh Saligrama,et al.  Adaptive Neural Networks for Fast Test-Time Prediction , 2017, ArXiv.