SyNERGY: An energy measurement and prediction framework for Convolutional Neural Networks on Jetson TX1

There is a huge demand for on-device execution of deep learning algorithms on mobile and embedded platforms. These devices present constraints on the application due to limited hardware resources and power. However, current evaluation studies in existing deep learning frameworks (for example, Caffe, Tensorflow, Torch and others) are limited to performance measurements of these applications on high-end CPUs and GPUs. In this work, we propose "SyNERGY" a fine-grained energy measurement (that is, at specific layers) and prediction framework for deep neural networks on embedded platforms. We integrate ARM’s Streamline Performance Analyser with standard deep learning frameworks such as Caffe and CuDNNv5 to quantify the energy-use of deep convolutional neural networks on the Nvidia Jetson Tegra X1. Our measurement framework provides an accurate breakdown of actual energy consumption and performance across all layers in the neural network while our prediction framework models the energy-use in terms of target-specific performance counters such as SIMD and bus accesses and application specific parameters such as Multiply and Accumulate (MAC) counts. Our experimental results using 9 representative Deep Convolutional Neural Network shows that a multi-variable linear regression model based on hardware performance counters alone achieves an average prediction test error of 8.04 ± 5.96% compared to actual energy measurements. Surprisingly, we find that it is possible to refine the model to predict the number of SIMD instructions and main memory accesses solely from the application’s Multiply-Accumulate (MAC) counts with an average prediction test error of 0.81 ± 0.77% and 17.97 ± 15.29% respectively. This alleviates the need for actual measurements giving a final average prediction test error of 7.08 ± 5.05% using solely the application’s MAC counts as

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  H. Howie Huang,et al.  Performance Analysis of GPU-Based Convolutional Neural Networks , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[3]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Zhuowen Tu,et al.  Training Deeper Convolutional Networks with Deep Supervision , 2015, ArXiv.

[7]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[8]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Graham D. Riley,et al.  Fine-grained energy profiling for deep convolutional neural networks on the Jetson TX1 , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[11]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[12]  Gu-Yeon Wei,et al.  Fathom: reference workloads for modern deep learning methods , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[13]  Forrest N. Iandola,et al.  Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale , 2016, ArXiv.

[14]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Qi Guo,et al.  BenchIP: Benchmarking Intelligence Processors , 2017, Journal of Computer Science and Technology.

[18]  Christian P. Robert,et al.  Machine Learning, a Probabilistic Perspective , 2014 .

[19]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[20]  Marian Verhelst,et al.  Embedded Deep Neural Network Processing: Algorithmic and Processor Techniques Bring Deep Learning to IoT and Edge Devices , 2017, IEEE Solid-State Circuits Magazine.

[21]  Håkan Grahn,et al.  Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree , 2017, GPC.