DLBench: An Experimental Evaluation of Deep Learning Frameworks

Recently, deep learning has become one of the most disruptive trends in the technology world. Deep learning techniques are increasingly achieving significant results in different domains such as speech recognition, image recognition and natural language processing. In general, there are various reasons behind the increasing popularity of deep learning techniques. These reasons include increasing data availability, the increasing availability of powerful hardware and computing resources in addition to the increasing availability of deep learning frameworks. In practice, the increasing popularity of deep learning frameworks calls for benchmarking studies that can effectively evaluate the performance characteristics of these systems. In this paper, we present an extensive experimental study of six popular deep learning frameworks, namely TensorFlow, MXNet, PyTorch, Theano, Chainer, and Keras. Our experimental evaluation covers different aspects for its comparison including accuracy, speed and resource consumption. Our experiments have been conducted on both CPU and GPU environments and using different datasets. We report and analyze the performance characteristics of the studied frameworks. In addition, we report a set of insights and important lessons that we have learned from conducting our experiments.

[1]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[2]  Fernando A. Mujica,et al.  An Empirical Evaluation of Deep Learning on Highway Driving , 2015, ArXiv.

[3]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Qiang Wang,et al.  Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[6]  Kunle Olukotun,et al.  DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .

[7]  Syed Muhammad Anwar,et al.  Deep Learning in Medical Image Analysis , 2017 .

[8]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[9]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[10]  Mohak Shah,et al.  Usability Study of Distributed Deep Learning Frameworks For Convolutional Neural Networks , 2018 .

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[13]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[14]  Mohak Shah,et al.  Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning , 2015, ArXiv.

[15]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[16]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[17]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[18]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[19]  Edward Y. Chang,et al.  Distributed Training Large-Scale Deep Architectures , 2017, ADMA.

[20]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.