Energy-Efficient Machine Learning on the Edges

Machine learning-based software is vital for future Internet of Things (IoT) applications and Connected and Autonomous Vehicles (CAVs) as it provides the core value of these services by leveraging the enormous amount of data collected on the edge. These services utilize various machine learning models which make it computationally intensive on the edges. There has been a lot of work to make the hardware efficient. No matter how efficient is the hardware, an inefficient machine learning model can account for high energy consumption and overheating problem. However, there are very few tools available that can help software developers or researchers to make the machine learning models energy efficient.Our main contributions of this paper are two-fold: First, we summarize the state-of-the-art techniques about energy-efficient machine learning on the edges. Second, targeting specific Java programming language, we present an Eclipse plugin named Java Energy Profiler & Optimizer (JEPO) which can help in profiling and optimizing machine learning source code written in Java. JEPO can automatically measure the energy consumption of source code at method granularity. It provides energy efficiency suggestions for data types, operators, control statements, String, exception, objects, and Arrays in Java. JEPO evaluation has shown up to 14.46% improvement in energy consumption when used to optimize the machine learning software WEKA with only 0.48% drop in accuracy.

[1]  Linda G. Shapiro,et al.  ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[3]  Yu David Liu,et al.  A Programming Model for Sustainable Software , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[4]  Weisong Shi,et al.  OpenEI: An Open Framework for Edge Intelligence , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[5]  Yu Cao,et al.  Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[6]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[7]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[8]  Weisong Shi,et al.  EdgeBox: Live Edge Video Analytics for Near Real-Time Event Detection , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).

[9]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[10]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[11]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[12]  Weisong Shi,et al.  Energy consumption in Java: An early experience , 2017, 2017 Eighth International Green and Sustainable Computing Conference (IGSC).

[13]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[14]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[15]  Shuang Wu,et al.  Creating Autonomous Vehicle Systems , 2017, Synthesis Lectures on Computer Science.

[16]  Abram Hindle Green Software Engineering: The Curse of Methodology , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[17]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[18]  Marian Verhelst,et al.  A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[19]  Eran Yahav,et al.  Chameleon: adaptive selection of collections , 2009, PLDI '09.

[20]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[21]  Shenghuo Zhu,et al.  Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.

[22]  David E. Culler,et al.  TinyOS: An Operating System for Sensor Networks , 2005, Ambient Intelligence.

[23]  Lori L. Pollock,et al.  SEEDS: a software engineer's energy-optimization decision support framework , 2014, ICSE.

[24]  Jiayu Li,et al.  ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers , 2018, ASPLOS.

[25]  Jian Sun,et al.  Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[27]  Weisong Shi,et al.  HydraOne: An Indoor Experimental Research and Education Platform for CAVs , 2019, HotEdge.

[28]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[29]  Shigeru Chiba Javassist - A Reflection-based Programming Wizard for Java , 1998 .

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[32]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[33]  Ninghui Sun,et al.  DianNao family , 2016, Commun. ACM.

[34]  Hamza M. Alvi,et al.  EnSights: A tool for energy aware software development , 2017, 2017 13th International Conference on Emerging Technologies (ICET).

[35]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[36]  Wenyao Xu,et al.  ADMM-based Weight Pruning for Real-Time Deep Learning Acceleration on Mobile Devices , 2019, ACM Great Lakes Symposium on VLSI.

[37]  Xiaohui Peng,et al.  The Φ-stack for smart web of things , 2017, SmartIoT@SEC.

[38]  Patrick Judd,et al.  Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[39]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[40]  Daisuke Miyashita,et al.  LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Dong Han,et al.  Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[42]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[43]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[44]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[45]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[46]  Yifan Wang,et al.  pCAMP: Performance Comparison of Machine Learning Packages on the Edges , 2019, HotEdge.

[47]  Mohit Kumar,et al.  Energy Efficiency Of Java Programming Language , 2017 .

[48]  Weisong Shi,et al.  E2M: an energy-efficient middleware for computer vision applications on autonomous mobile robots , 2019, SEC.

[49]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[50]  Xiaopei Wu,et al.  OpenVDAP: An Open Vehicular Data Analytics Platform for CAVs , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[51]  Peng Zhang,et al.  Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[52]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[53]  Jácome Cunha,et al.  jStanley: Placing a Green Thumb on Java Collections , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[54]  Eleni Stroulia,et al.  GreenAdvisor: A tool for analyzing the impact of software evolution on energy consumption , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[55]  Yanzhi Wang,et al.  Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM , 2019, ArXiv.

[56]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[57]  Weisong Shi,et al.  Collaborative Learning on the Edges: A Case Study on Connected Vehicles , 2019, HotEdge.

[58]  Saurabh Goyal,et al.  Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things , 2017, ICML.

[59]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.