论文信息 - Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

We propose a novel variant of SGD customized for training network architectures that support anytime behavior: such networks produce a series of increasingly accurate outputs over time. Efficient architectural designs for these networks focus on re-using internal state; subnetworks must produce representations relevant for both immediate prediction as well as refinement by subsequent network stages. We consider traditional branched networks as well as a new class of recursively nested networks. Our new optimizer, Orthogonalized SGD, dynamically re-balances task-specific gradients when training a multitask network. In the context of anytime architectures, this optimizer projects gradients from later outputs onto a parameter subspace that does not interfere with those from earlier outputs. Experiments demonstrate that training with Orthogonalized SGD significantly improves generalization accuracy of anytime networks.

[1] Germán Ros,et al. CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[2] Li Zhang,et al. Spatially Adaptive Computation Time for Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Kilian Q. Weinberger,et al. Multi-Scale Dense Convolutional Networks for Efficient Prediction , 2017, ArXiv.

[4] Zhao Chen,et al. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[5] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[6] Lingjia Tang,et al. The Architectural Implications of Autonomous Driving: Constraints and Acceleration , 2018, ASPLOS.

[7] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[8] Roberto Cipolla,et al. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[10] Mehrdad Farajtabar,et al. Orthogonal Gradient Descent for Continual Learning , 2019, AISTATS.

[11] Yoshua Bengio,et al. Residual Connections Encourage Iterative Inference , 2017, ICLR.

[12] Roberto Cipolla,et al. MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).

[13] Eric S. Chung,et al. A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[14] Rich Caruana,et al. Model compression , 2006, KDD '06.

[15] Bin Yang,et al. SBNet: Sparse Blocks Network for Fast Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Ping Tan,et al. Sparsely Aggregated Convolutional Networks , 2018, ECCV.

[17] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[20] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[22] Shlomo Zilberstein,et al. Using Anytime Algorithms in Intelligent Systems , 1996, AI Mag..

[23] Jianmin Wang,et al. Learning Multiple Tasks with Deep Relationship Networks , 2015, ArXiv.

[24] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[25] Pietro Perona,et al. Deciding How to Decide: Dynamic Routing in Artificial Neural Networks , 2017, ICML.

[26] Kilian Q. Weinberger,et al. The Greedy Miser: Learning under Test-time Budgets , 2012, ICML.

[27] Fernando A. Mujica,et al. An Empirical Evaluation of Deep Learning on Highway Driving , 2015, ArXiv.

[28] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[29] Serge J. Belongie,et al. Convolutional Networks with Adaptive Inference Graphs , 2017, International Journal of Computer Vision.

[30] Yan Wang,et al. Anytime Stereo Image Depth Estimation on Mobile Devices , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[31] Liang Lin,et al. SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[32] Gustav Larsson,et al. Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning , 2017, ArXiv.

[33] Iasonas Kokkinos,et al. UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[35] Anders Søgaard,et al. Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[36] H. T. Kung,et al. BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[37] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[38] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[39] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[40] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41] Pietro Perona,et al. Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[43] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[44] Yongxin Yang,et al. Deep Multi-task Representation Learning: A Tensor Factorisation Approach , 2016, ICLR.

[45] Yoshimasa Tsuruoka,et al. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[46] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] J. Andrew Bagnell,et al. SpeedBoost: Anytime Prediction with Uniform Near-Optimality , 2012, AISTATS.

[48] Jianxiong Xiao,et al. DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49] Henry Hoffmann,et al. PCP: A Generalized Approach to Optimizing Performance Under Power Constraints through Resource Management , 2014, ICAC.

[50] Larry S. Davis,et al. BlockDrop: Dynamic Inference Paths in Residual Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51] Simon King,et al. Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52] Jürgen Schmidhuber,et al. Highway and Residual Networks learn Unrolled Iterative Estimation , 2016, ICLR.

[53] Jasha Droppo,et al. Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54] Jinwoo Shin,et al. Anytime Neural Prediction via Slicing Networks Vertically , 2018, ArXiv.

[55] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Debadeepta Dey,et al. Learning Anytime Predictions in Neural Networks via Adaptive Loss Balancing , 2017, AAAI.

[57] Andrea Vedaldi,et al. Universal representations: The missing link between faces, text, planktons, and cat breeds , 2017, ArXiv.

[58] Leonidas J. Guibas,et al. Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59] Gregory Shakhnarovich,et al. FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[60] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[61] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[62] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .