论文信息 - Chapter Eight - Energy-efficient deep learning inference on edge devices - 字舞流文

Chapter Eight - Energy-efficient deep learning inference on edge devices

Massimo Poncino | Daniele Jahier Pagliari | Francesco Daghero | M. Poncino | Francesco Daghero | D. J. Pagliari

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] H. T. Kung,et al. BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[3] Luca Benini,et al. Plenty of room at the bottom? Micropower deep learning for cognitive cyber physical systems , 2017, 2017 7th IEEE International Workshop on Advances in Sensors and Interfaces (IWASI).

[4] Daisuke Miyashita,et al. Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[5] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[6] Enrico Macii,et al. Dynamic Bit-width Reconfiguration for Energy-Efficient Deep Learning Hardware , 2018, ISLPED.

[7] Jang Hyun Cho,et al. On the Efficacy of Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8] Sherief Reda,et al. Runtime configurable deep neural networks for energy-accuracy trade-off , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[9] Dejan Markovic,et al. A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs , 2014, FPGA.

[10] Shuchang Zhou,et al. Effective Quantization Methods for Recurrent Neural Networks , 2016, ArXiv.

[11] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[12] Massoud Pedram,et al. BottleNet: A Deep Learning Architecture for Intelligent Mobile Cloud Computing Services , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[13] Enrico Macii,et al. Input-Dependent Edge-Cloud Mapping of Recurrent Neural Networks Inference , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[15] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[16] Alex Graves,et al. Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.

[17] Weihua Zhuang,et al. Learning-Based Computation Offloading for IoT Devices With Energy Harvesting , 2017, IEEE Transactions on Vehicular Technology.

[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19] Vikas Chandra,et al. CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.

[20] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[21] Enrico Macii,et al. Dynamic Beam Width Tuning for Energy-Efficient Recurrent Neural Networks , 2019, ACM Great Lakes Symposium on VLSI.

[22] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[23] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[24] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25] Majid Ahmadi,et al. Efficient hardware implementation of the hyperbolic tangent sigmoid function , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[26] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[27] Markus Freitag,et al. Beam Search Strategies for Neural Machine Translation , 2017, NMT@ACL.

[28] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[29] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[30] Enrico Macii,et al. CNN-Based Camera-less User Attention Detection for Smartphone Power Management , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[31] Germain Forestier,et al. Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[32] Luca Benini,et al. PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors , 2019, Philosophical Transactions of the Royal Society A.

[33] Manuel Mejia-Lavalle,et al. Beam Search with Dynamic Pruning for Artificial Intelligence Hard Problems , 2013, 2013 International Conference on Mechatronics, Electronics and Automotive Engineering.

[34] Vineeth N. Balasubramanian,et al. Deep Model Compression: Distilling Knowledge from Noisy Teachers , 2016, ArXiv.

[35] Davide Rossi,et al. Energy efficient parallel computing on the PULP platform with support for OpenMP , 2014, 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI).

[36] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[37] Xukan Ran,et al. Deep Learning With Edge Computing: A Review , 2019, Proceedings of the IEEE.

[38] Ahmad Shawahna,et al. FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review , 2019, IEEE Access.

[39] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[40] Enrico Macii,et al. Automated Synthesis of Energy-Efficient Reconfigurable-Precision Circuits , 2019, IEEE Access.

[41] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.

[42] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[43] Soheil Ghiasi,et al. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[44] Victor Hugo C. de Albuquerque,et al. Deep learning IoT system for online stroke detection in skull computed tomography images , 2019, Comput. Networks.

[45] Dazhong Wu,et al. Deep learning for smart manufacturing: Methods and applications , 2018, Journal of Manufacturing Systems.

[46] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[47] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[48] Mianxiong Dong,et al. When Weather Matters: IoT-Based Electrical Load Forecasting for Smart Grid , 2017, IEEE Communications Magazine.

[49] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[50] Ying Zhang,et al. Recurrent Neural Networks With Limited Numerical Precision , 2016, ArXiv.

[51] Massoud Pedram,et al. JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services , 2018, IEEE Transactions on Mobile Computing.

[52] Vinod Vokkarane,et al. A New Deep Learning-Based Food Recognition System for Dietary Assessment on An Edge Computing Service Infrastructure , 2018, IEEE Transactions on Services Computing.

[53] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[54] Xin Wang,et al. SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[55] Niraj K. Jha,et al. A Hierarchical Inference Model for Internet-of-Things , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[56] Marcus Edel,et al. Binarized-BLSTM-RNN based Human Activity Recognition , 2016, 2016 International Conference on Indoor Positioning and Indoor Navigation (IPIN).

[57] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[58] Wonyong Sung,et al. Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[59] Eunhyeok Park,et al. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[60] Andreas Gerstlauer,et al. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[61] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[62] Trevor N. Mudge,et al. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[63] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[64] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65] Yonglong Tian,et al. Contrastive Representation Distillation , 2019, ICLR.

[66] Enrico Macii,et al. Energy-Efficient Digital Processing via Approximate Computing , 2016 .

[67] Massimo Poncino,et al. Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search , 2020, Electronics.

[68] Chen Zhang,et al. Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity , 2019, FPGA.

[69] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[70] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.

[71] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[72] Vivienne Sze,et al. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73] Fei-Yue Wang,et al. An efficient realization of deep learning for traffic data imputation , 2016 .

[74] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[75] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[76] Sungroh Yoon,et al. Big/little deep neural network for ultra low power inference , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[77] Nectarios Koziris,et al. Understanding the Performance of Sparse Matrix-Vector Multiplication , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[78] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[79] Wonyong Sung,et al. Fixed-point performance analysis of recurrent neural networks , 2016, ICASSP.

[80] Enrico Macii,et al. Optimal Input-Dependent Edge-Cloud Partitioning for RNN Inference , 2019, 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS).

[81] Tajana Simunic,et al. Hierarchical and Distributed Machine Learning Inference Beyond the Edge , 2019, 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC).

[82] Andreas Kamilaris,et al. Deep learning in agriculture: A survey , 2018, Comput. Electron. Agric..

[83] Rich Caruana,et al. Model compression , 2006, KDD '06.

[84] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[85] Raghuraman Krishnamoorthi,et al. Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[86] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[87] Marian Verhelst,et al. DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[88] Luca Benini,et al. YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[89] Massimo Poncino,et al. Application-Driven Synthesis of Energy-Efficient Reconfigurable-Precision Operators , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[90] Marian Verhelst,et al. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[91] Seyed Iman Mirzadeh,et al. Improved Knowledge Distillation via Teacher Assistant , 2020, AAAI.

[92] Kaushik Roy,et al. Staged Inference using Conditional Deep Learning for energy efficient real-time smart diagnosis , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[93] Daisuke Miyashita,et al. LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[94] Luca Benini,et al. XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[95] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.

[96] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[97] Lipo Wang,et al. Deep Learning Applications in Medical Image Analysis , 2018, IEEE Access.

[98] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[99] Maximilian Lam,et al. Benchmarking TinyML Systems: Challenges and Direction , 2020, ArXiv.