Anytime Neural Networks via Joint Optimization of Auxiliary Losses

We address the problem of anytime prediction in neural networks. An anytime predictor automatically adjusts to and utilizes available test-time budget: it produces a crude initial result quickly and continuously refines the result afterwards. Traditional feed-forward networks achieve state-of-the-art performance on many machine learning tasks, but cannot produce anytime predictions during their typically expensive computation. In this work, we propose to add auxiliary predictions in a residual network to generate anytime predictions, and optimize these predictions simultaneously. We solve this multi-objective optimization by minimizing a carefully constructed weighted sum of losses. We also oscillate weightings of the losses in each iteration to avoid spurious solutions that are optimal for the sum but not for each individual loss. The proposed approach produces competitive results if computation is interrupted early, and the same level of performance as the original network once computation is finished. Observing that the relative performance gap between the optimal and our proposed anytime network shrinks as the network is near completion, we propose a method to combine anytime networks to achieve more accurate anytime predictions with a constant fraction of additional cost. We evaluate the proposed methods on real-world visual recognition data-sets to demonstrate their anytime performance.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[3]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  François Fleuret,et al.  Joint Cascade Optimization Using A Product Of Boosted Classifiers , 2010, NIPS.

[6]  Shlomo Zilberstein,et al.  Anytime algorithm development tools , 1996, SGAR.

[7]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  J. Andrew Bagnell,et al.  SpeedBoost: Anytime Prediction with Uniform Near-Optimality , 2012, AISTATS.

[10]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[11]  Nuno Vasconcelos,et al.  Learning Complexity-Aware Cascades for Deep Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[13]  Yonatan Wexler,et al.  Minimizing the Maximal Loss: How and Why , 2016, ICML.

[14]  Trevor Darrell,et al.  Timely Object Recognition , 2012, NIPS.

[15]  Junmo Kim,et al.  Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[18]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[19]  Matt J. Kusner,et al.  Classifier cascades and trees for minimizing feature evaluation cost , 2014, J. Mach. Learn. Res..

[20]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[26]  Martial Hebert,et al.  Efficient Feature Group Sequencing for Anytime Linear Prediction , 2014, UAI.