Benchmarking Deep Learning Frameworks: Design Considerations, Metrics and Beyond

With increasing number of open-source deep learning (DL) software tools made available, benchmarking DL software frameworks and systems is in high demand. This paper presents design considerations, metrics and challenges towards developing an effective benchmark for DL software frameworks and illustrate our observations through a comparative study of three popular DL frameworks: TensorFlow, Caffe, and Torch. First, we show that these deep learning frameworks are optimized with their default configurations settings. However, the default configuration optimized on one specific dataset may not work effectively for other datasets with respect to runtime performance and learning accuracy. Second, the default configuration optimized on a dataset by one DL framework does not work well for another DL framework on the same dataset. Third, experiments show that different DL frameworks exhibit different levels of robustness against adversarial examples. Through this study, we envision that unlike traditional performance-driven benchmarks, benchmarking deep learning software frameworks should take into account of both runtime and accuracy and their latent interaction with hyper-parameters and data-dependent configurations of DL frameworks.

[1]  Shaohuai Shi,et al.  Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs , 2017, 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech).

[2]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[4]  Paul Mineiro,et al.  Machine learning on Big Data , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[5]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[6]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[7]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[8]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[9]  Jaejin Lee,et al.  Performance analysis of CNN frameworks for GPUs , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[10]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[11]  Shaohuai Shi,et al.  Performance Evaluation of Deep Learning Tools in Docker Containers , 2017, 2017 3rd International Conference on Big Data Computing and Communications (BIGCOM).

[12]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Mohak Shah,et al.  Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning , 2015, ArXiv.

[15]  Seung-Jong Park,et al.  Evaluation of Deep Learning Frameworks Over Different HPC Architectures , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[16]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[17]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[18]  Dhabaleswar K. Panda,et al.  An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures , 2017, MLHPC@SC.

[19]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[21]  Kunle Olukotun,et al.  DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .

[22]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[23]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[24]  Qiang Wang,et al.  Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[25]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[26]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[27]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[28]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[31]  George K. Karagiannidis,et al.  Efficient Machine Learning for Big Data: A Review , 2015, Big Data Res..