Chapter Eight - Energy-efficient deep learning inference on edge devices
暂无分享,去创建一个
Massimo Poncino | Daniele Jahier Pagliari | Francesco Daghero | M. Poncino | Francesco Daghero | D. J. Pagliari
[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[2] H. T. Kung,et al. BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).
[3] Luca Benini,et al. Plenty of room at the bottom? Micropower deep learning for cognitive cyber physical systems , 2017, 2017 7th IEEE International Workshop on Advances in Sensors and Interfaces (IWASI).
[4] Daisuke Miyashita,et al. Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.
[5] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.
[6] Enrico Macii,et al. Dynamic Bit-width Reconfiguration for Energy-Efficient Deep Learning Hardware , 2018, ISLPED.
[7] Jang Hyun Cho,et al. On the Efficacy of Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Sherief Reda,et al. Runtime configurable deep neural networks for energy-accuracy trade-off , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[9] Dejan Markovic,et al. A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs , 2014, FPGA.
[10] Shuchang Zhou,et al. Effective Quantization Methods for Recurrent Neural Networks , 2016, ArXiv.
[11] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[12] Massoud Pedram,et al. BottleNet: A Deep Learning Architecture for Intelligent Mobile Cloud Computing Services , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).
[13] Enrico Macii,et al. Input-Dependent Edge-Cloud Mapping of Recurrent Neural Networks Inference , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).
[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[15] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[16] Alex Graves,et al. Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.
[17] Weihua Zhuang,et al. Learning-Based Computation Offloading for IoT Devices With Energy Harvesting , 2017, IEEE Transactions on Vehicular Technology.
[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[19] Vikas Chandra,et al. CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.
[20] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[21] Enrico Macii,et al. Dynamic Beam Width Tuning for Energy-Efficient Recurrent Neural Networks , 2019, ACM Great Lakes Symposium on VLSI.
[22] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[23] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[24] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[25] Majid Ahmadi,et al. Efficient hardware implementation of the hyperbolic tangent sigmoid function , 2009, 2009 IEEE International Symposium on Circuits and Systems.
[26] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[27] Markus Freitag,et al. Beam Search Strategies for Neural Machine Translation , 2017, NMT@ACL.
[28] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[29] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[30] Enrico Macii,et al. CNN-Based Camera-less User Attention Detection for Smartphone Power Management , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).
[31] Germain Forestier,et al. Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.
[32] Luca Benini,et al. PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors , 2019, Philosophical Transactions of the Royal Society A.
[33] Manuel Mejia-Lavalle,et al. Beam Search with Dynamic Pruning for Artificial Intelligence Hard Problems , 2013, 2013 International Conference on Mechatronics, Electronics and Automotive Engineering.
[34] Vineeth N. Balasubramanian,et al. Deep Model Compression: Distilling Knowledge from Noisy Teachers , 2016, ArXiv.
[35] Davide Rossi,et al. Energy efficient parallel computing on the PULP platform with support for OpenMP , 2014, 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI).
[36] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[37] Xukan Ran,et al. Deep Learning With Edge Computing: A Review , 2019, Proceedings of the IEEE.
[38] Ahmad Shawahna,et al. FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review , 2019, IEEE Access.
[39] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.
[40] Enrico Macii,et al. Automated Synthesis of Energy-Efficient Reconfigurable-Precision Circuits , 2019, IEEE Access.
[41] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.
[42] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[43] Soheil Ghiasi,et al. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[44] Victor Hugo C. de Albuquerque,et al. Deep learning IoT system for online stroke detection in skull computed tomography images , 2019, Comput. Networks.
[45] Dazhong Wu,et al. Deep learning for smart manufacturing: Methods and applications , 2018, Journal of Manufacturing Systems.
[46] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[47] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[48] Mianxiong Dong,et al. When Weather Matters: IoT-Based Electrical Load Forecasting for Smart Grid , 2017, IEEE Communications Magazine.
[49] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[50] Ying Zhang,et al. Recurrent Neural Networks With Limited Numerical Precision , 2016, ArXiv.
[51] Massoud Pedram,et al. JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services , 2018, IEEE Transactions on Mobile Computing.
[52] Vinod Vokkarane,et al. A New Deep Learning-Based Food Recognition System for Dietary Assessment on An Edge Computing Service Infrastructure , 2018, IEEE Transactions on Services Computing.
[53] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[54] Xin Wang,et al. SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.
[55] Niraj K. Jha,et al. A Hierarchical Inference Model for Internet-of-Things , 2018, IEEE Transactions on Multi-Scale Computing Systems.
[56] Marcus Edel,et al. Binarized-BLSTM-RNN based Human Activity Recognition , 2016, 2016 International Conference on Indoor Positioning and Indoor Navigation (IPIN).
[57] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[58] Wonyong Sung,et al. Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..
[59] Eunhyeok Park,et al. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.
[60] Andreas Gerstlauer,et al. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[61] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[62] Trevor N. Mudge,et al. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.
[63] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[64] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[65] Yonglong Tian,et al. Contrastive Representation Distillation , 2019, ICLR.
[66] Enrico Macii,et al. Energy-Efficient Digital Processing via Approximate Computing , 2016 .
[67] Massimo Poncino,et al. Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search , 2020, Electronics.
[68] Chen Zhang,et al. Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity , 2019, FPGA.
[69] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[70] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.
[71] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[72] Vivienne Sze,et al. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[73] Fei-Yue Wang,et al. An efficient realization of deep learning for traffic data imputation , 2016 .
[74] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.
[75] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[76] Sungroh Yoon,et al. Big/little deep neural network for ultra low power inference , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[77] Nectarios Koziris,et al. Understanding the Performance of Sparse Matrix-Vector Multiplication , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).
[78] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[79] Wonyong Sung,et al. Fixed-point performance analysis of recurrent neural networks , 2016, ICASSP.
[80] Enrico Macii,et al. Optimal Input-Dependent Edge-Cloud Partitioning for RNN Inference , 2019, 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS).
[81] Tajana Simunic,et al. Hierarchical and Distributed Machine Learning Inference Beyond the Edge , 2019, 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC).
[82] Andreas Kamilaris,et al. Deep learning in agriculture: A survey , 2018, Comput. Electron. Agric..
[83] Rich Caruana,et al. Model compression , 2006, KDD '06.
[84] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[85] Raghuraman Krishnamoorthi,et al. Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.
[86] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .
[87] Marian Verhelst,et al. DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[88] Luca Benini,et al. YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).
[89] Massimo Poncino,et al. Application-Driven Synthesis of Energy-Efficient Reconfigurable-Precision Operators , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).
[90] Marian Verhelst,et al. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).
[91] Seyed Iman Mirzadeh,et al. Improved Knowledge Distillation via Teacher Assistant , 2020, AAAI.
[92] Kaushik Roy,et al. Staged Inference using Conditional Deep Learning for energy efficient real-time smart diagnosis , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).
[93] Daisuke Miyashita,et al. LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[94] Luca Benini,et al. XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[95] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[96] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[97] Lipo Wang,et al. Deep Learning Applications in Medical Image Analysis , 2018, IEEE Access.
[98] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[99] Maximilian Lam,et al. Benchmarking TinyML Systems: Challenges and Direction , 2020, ArXiv.