pCAMP: Performance Comparison of Machine Learning Packages on the Edges

Machine learning has changed the computing paradigm. Products today are built with machine intelligence as a central attribute, and consumers are beginning to expect near-human interaction with the appliances they use. However, much of the deep learning revolution has been limited to the cloud. Recently, several machine learning packages based on edge devices have been announced which aim to offload the computing to the edges. However, little research has been done to evaluate these packages on the edges, making it difficult for end users to select an appropriate pair of software and hardware. In this paper, we make a performance comparison of several state-of-the-art machine learning packages on the edges, including TensorFlow, Caffe2, MXNet, PyTorch, and TensorFlow Lite. We focus on evaluating the latency, memory footprint, and energy of these tools with two popular types of neural networks on different edge devices. This evaluation not only provides a reference to select appropriate combinations of hardware and software packages for end users but also points out possible future directions to optimize packages for developers.

[1]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[4]  Qiang Wang,et al.  Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[5]  Suman Banerjee,et al.  A vehicle-based edge computing platform for transit and human mobility analytics , 2017, SEC.

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Amit Agarwal,et al.  CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.

[8]  Xiaopei Wu,et al.  OpenVDAP: An Open Vehicular Data Analytics Platform for CAVs , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[9]  Jason Flinn,et al.  Gremlin: scheduling interactions in vehicular computing , 2017, SEC.

[10]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[11]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[12]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Mohak Shah,et al.  Comparative Study of Deep Learning Software Frameworks , 2015, 1511.06435.

[15]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Seung-Jong Park,et al.  Evaluation of Deep Learning Frameworks Over Different HPC Architectures , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[18]  Vincent M. Weaver,et al.  A Validation of DRAM RAPL Power Measurements , 2016, MEMSYS.

[19]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[20]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[21]  Zhuo Chen,et al.  Edge Analytics in the Internet of Things , 2015, IEEE Pervasive Computing.