EBJR: Energy-Based Joint Reasoning for Adaptive Inference

State-of-the-art deep learning models have achieved significant performance levels on various benchmarks. However, the excellent performance comes at a cost of inefficient computational cost. Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency. This paper presents a new method of jointly using the large accurate models together with the small fast ones. To this end, we propose an Energy-Based Joint Reasoning (EBJR) framework that adaptively distributes the samples between shallow and deep models to achieve an accuracy close to the deep model, but latency close to the shallow one. Our method is applicable to out-of-the-box pre-trained models as it does not require an architecture change nor retraining. Moreover, it is easy to use and deploy, especially for cloud services. Through a comprehensive set of experiments on different down-stream tasks, we show that our method outperforms strong state-of-the-art approaches with a considerable margin. In addition, we propose specialized EBJR, an extension of our method where we create a smaller specialized side model that performs the target task only partially, but yields an even higher accuracy and faster inference. We verify the strengths of our methods with both theoretical and experimental evaluations.

[1]  Xin Wang,et al.  SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[2]  Kaushik Roy,et al.  Conditional Deep Learning for energy-efficient and enhanced pattern recognition , 2015, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Zhijian Liu,et al.  HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Li Zhang,et al.  Spatially Adaptive Computation Time for Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Sherief Reda,et al.  Runtime configurable deep neural networks for energy-accuracy trade-off , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[9]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[10]  Abdelkader Baggag,et al.  Entropy, Free Energy, and Work of Restricted Boltzmann Machines , 2020, Entropy.

[11]  Nam Thoai,et al.  Paying more Attention to Snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation , 2020, BMVC.

[12]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yi Yang,et al.  More is Less: A More Complicated Network with Less Inference Complexity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Larry Davis,et al.  M2KD: Incremental Learning via Multi-model and Multi-level Knowledge Distillation , 2020, BMVC.

[15]  QING JIN,et al.  Neural Network Quantization with Scale-Adjusted Training , 2020, BMVC.

[16]  Bingzhe Wu,et al.  ENAS4D: Efficient Multi-stage CNN Architecture Search for Dynamic Inference , 2020, ArXiv.

[17]  Konstantin Berestizshevsky,et al.  Sacrificing Accuracy for Reduced Computation: Cascaded Inference Based on Softmax Confidence , 2018, ArXiv.

[18]  Thomas S. Huang,et al.  Universally Slimmable Networks and Improved Training Techniques , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Jordi Pont-Tuset,et al.  The Open Images Dataset V4 , 2018, International Journal of Computer Vision.

[20]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[21]  Weitang Liu,et al.  Energy-based Out-of-distribution Detection , 2020, NeurIPS.

[22]  Ning Xu,et al.  Slimmable Neural Networks , 2018, ICLR.

[23]  Kurt Keutzer,et al.  ZeroQ: A Novel Zero Shot Quantization Framework , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Martin Danelljan,et al.  Energy-Based Models for Deep Probabilistic Regression , 2020, ECCV.

[25]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[26]  H. T. Kung,et al.  BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[27]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[28]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[29]  Serge J. Belongie,et al.  Convolutional Networks with Adaptive Inference Graphs , 2017, International Journal of Computer Vision.

[30]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[31]  Jianxin Wu,et al.  Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[33]  Sungroh Yoon,et al.  Big/little deep neural network for ultra low power inference , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[34]  Jia Deng,et al.  Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-offs by Selective Execution , 2017, AAAI.

[35]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[36]  Jongyoul Park,et al.  An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Lothar Thiele,et al.  Rethinking Pruning for Accelerating Deep Inference At the Edge , 2020, KDD.

[38]  Larry S. Davis,et al.  BlockDrop: Dynamic Inference Paths in Residual Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Quoc V. Le,et al.  Rethinking Pre-training and Self-training , 2020, NeurIPS.

[40]  Armand Joulin,et al.  Self-supervised Pretraining of Visual Features in the Wild , 2021, ArXiv.

[41]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[42]  Fuchun Sun,et al.  Resolution Switchable Networks for Runtime Efficient Image Recognition , 2020, ECCV.

[43]  Jakob Verbeek,et al.  Adaptative Inference Cost With Convolutional Neural Mixture Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Chen Chen,et al.  MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution , 2019, ECCV.

[45]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[46]  Jifeng Dai,et al.  Resolution Adaptive Networks for Efficient Inference , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Felix Abramovich,et al.  Classification with many classes: Challenges and pluses , 2015, J. Multivar. Anal..

[48]  Han Zhang,et al.  A Simple Semi-Supervised Learning Framework for Object Detection , 2020, ArXiv.

[49]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[50]  Quoc V. Le,et al.  EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Kurt Keutzer,et al.  HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.