Chapter Eight - Energy-efficient deep learning inference on edge devices

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  H. T. Kung,et al.  BranchyNet: Fast inference via early exiting from deep neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[3]  Luca Benini,et al.  Plenty of room at the bottom? Micropower deep learning for cognitive cyber physical systems , 2017, 2017 7th IEEE International Workshop on Advances in Sensors and Interfaces (IWASI).

[4]  Daisuke Miyashita,et al.  Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[5]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[6]  Enrico Macii,et al.  Dynamic Bit-width Reconfiguration for Energy-Efficient Deep Learning Hardware , 2018, ISLPED.

[7]  Jang Hyun Cho,et al.  On the Efficacy of Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Sherief Reda,et al.  Runtime configurable deep neural networks for energy-accuracy trade-off , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[9]  Dejan Markovic,et al.  A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs , 2014, FPGA.

[10]  Shuchang Zhou,et al.  Effective Quantization Methods for Recurrent Neural Networks , 2016, ArXiv.

[11]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[12]  Massoud Pedram,et al.  BottleNet: A Deep Learning Architecture for Intelligent Mobile Cloud Computing Services , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[13]  Enrico Macii,et al.  Input-Dependent Edge-Cloud Mapping of Recurrent Neural Networks Inference , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[16]  Alex Graves,et al.  Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.

[17]  Weihua Zhuang,et al.  Learning-Based Computation Offloading for IoT Devices With Energy Harvesting , 2017, IEEE Transactions on Vehicular Technology.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Vikas Chandra,et al.  CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.

[20]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[21]  Enrico Macii,et al.  Dynamic Beam Width Tuning for Energy-Efficient Recurrent Neural Networks , 2019, ACM Great Lakes Symposium on VLSI.

[22]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[23]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[24]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Majid Ahmadi,et al.  Efficient hardware implementation of the hyperbolic tangent sigmoid function , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[26]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[27]  Markus Freitag,et al.  Beam Search Strategies for Neural Machine Translation , 2017, NMT@ACL.

[28]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[29]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[30]  Enrico Macii,et al.  CNN-Based Camera-less User Attention Detection for Smartphone Power Management , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[31]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[32]  Luca Benini,et al.  PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors , 2019, Philosophical Transactions of the Royal Society A.

[33]  Manuel Mejia-Lavalle,et al.  Beam Search with Dynamic Pruning for Artificial Intelligence Hard Problems , 2013, 2013 International Conference on Mechatronics, Electronics and Automotive Engineering.

[34]  Vineeth N. Balasubramanian,et al.  Deep Model Compression: Distilling Knowledge from Noisy Teachers , 2016, ArXiv.

[35]  Davide Rossi,et al.  Energy efficient parallel computing on the PULP platform with support for OpenMP , 2014, 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI).

[36]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[37]  Xukan Ran,et al.  Deep Learning With Edge Computing: A Review , 2019, Proceedings of the IEEE.

[38]  Ahmad Shawahna,et al.  FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review , 2019, IEEE Access.

[39]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[40]  Enrico Macii,et al.  Automated Synthesis of Energy-Efficient Reconfigurable-Precision Circuits , 2019, IEEE Access.

[41]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[42]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[43]  Soheil Ghiasi,et al.  Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Victor Hugo C. de Albuquerque,et al.  Deep learning IoT system for online stroke detection in skull computed tomography images , 2019, Comput. Networks.

[45]  Dazhong Wu,et al.  Deep learning for smart manufacturing: Methods and applications , 2018, Journal of Manufacturing Systems.

[46]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[47]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[48]  Mianxiong Dong,et al.  When Weather Matters: IoT-Based Electrical Load Forecasting for Smart Grid , 2017, IEEE Communications Magazine.

[49]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[50]  Ying Zhang,et al.  Recurrent Neural Networks With Limited Numerical Precision , 2016, ArXiv.

[51]  Massoud Pedram,et al.  JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services , 2018, IEEE Transactions on Mobile Computing.

[52]  Vinod Vokkarane,et al.  A New Deep Learning-Based Food Recognition System for Dietary Assessment on An Edge Computing Service Infrastructure , 2018, IEEE Transactions on Services Computing.

[53]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[54]  Xin Wang,et al.  SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[55]  Niraj K. Jha,et al.  A Hierarchical Inference Model for Internet-of-Things , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[56]  Marcus Edel,et al.  Binarized-BLSTM-RNN based Human Activity Recognition , 2016, 2016 International Conference on Indoor Positioning and Indoor Navigation (IPIN).

[57]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[58]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[59]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[60]  Andreas Gerstlauer,et al.  DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[61]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[62]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[63]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[64]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Yonglong Tian,et al.  Contrastive Representation Distillation , 2019, ICLR.

[66]  Enrico Macii,et al.  Energy-Efficient Digital Processing via Approximate Computing , 2016 .

[67]  Massimo Poncino,et al.  Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search , 2020, Electronics.

[68]  Chen Zhang,et al.  Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity , 2019, FPGA.

[69]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[70]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[71]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[72]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Fei-Yue Wang,et al.  An efficient realization of deep learning for traffic data imputation , 2016 .

[74]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[75]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[76]  Sungroh Yoon,et al.  Big/little deep neural network for ultra low power inference , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[77]  Nectarios Koziris,et al.  Understanding the Performance of Sparse Matrix-Vector Multiplication , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[78]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[79]  Wonyong Sung,et al.  Fixed-point performance analysis of recurrent neural networks , 2016, ICASSP.

[80]  Enrico Macii,et al.  Optimal Input-Dependent Edge-Cloud Partitioning for RNN Inference , 2019, 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS).

[81]  Tajana Simunic,et al.  Hierarchical and Distributed Machine Learning Inference Beyond the Edge , 2019, 2019 IEEE 16th International Conference on Networking, Sensing and Control (ICNSC).

[82]  Andreas Kamilaris,et al.  Deep learning in agriculture: A survey , 2018, Comput. Electron. Agric..

[83]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[84]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[85]  Raghuraman Krishnamoorthi,et al.  Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[86]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[87]  Marian Verhelst,et al.  DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[88]  Luca Benini,et al.  YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[89]  Massimo Poncino,et al.  Application-Driven Synthesis of Energy-Efficient Reconfigurable-Precision Operators , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[90]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[91]  Seyed Iman Mirzadeh,et al.  Improved Knowledge Distillation via Teacher Assistant , 2020, AAAI.

[92]  Kaushik Roy,et al.  Staged Inference using Conditional Deep Learning for energy efficient real-time smart diagnosis , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[93]  Daisuke Miyashita,et al.  LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[94]  Luca Benini,et al.  XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[95]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[96]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[97]  Lipo Wang,et al.  Deep Learning Applications in Medical Image Analysis , 2018, IEEE Access.

[98]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[99]  Maximilian Lam,et al.  Benchmarking TinyML Systems: Challenges and Direction , 2020, ArXiv.