Detailed Characterization of Deep Neural Networks on GPUs and FPGAs

Deep neural networks (DNNs) have been proving the effectiveness in various computing fields. To provide more efficient computing platforms for DNN applications, it is essential to have evaluation environments that include assorted benchmark workloads. Though a few DNN benchmark suites have been recently released, most of them require to install proprietary DNN libraries or resource-intensive DNN frameworks, which can run only on certain architectures. Also, some of the benchmark suites only support a few per-layer functions where the interactions between layers can not be measured. To provide a more scalable evaluation environment, we present a new DNN benchmark suite, Tango, that can run on any platform that supports CUDA and OpenCL. Tango includes the most widely used five convolution neural networks and two recurrent neural networks. We provide in-depth architectural statistics of these networks while running them on an architecture simulator, a server- and a mobile-GPU, and a mobile FPGA.

[1]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[2]  Nam Sung Kim,et al.  GPU register file virtualization , 2015, MICRO.

[3]  Kunle Olukotun,et al.  DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .

[4]  David R. Kaeli,et al.  DNNMark: A Deep Neural Network Benchmark Suite for GPUs , 2017, GPGPU@PPoPP.

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Gu-Yeon Wei,et al.  Fathom: reference workloads for modern deep learning methods , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[7]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[8]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[9]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[10]  François Chollet,et al.  Keras: The Python Deep Learning library , 2018 .

[11]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[12]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ronald M. Summers,et al.  Improving Computer-Aided Detection Using Convolutional Neural Networks and Random View Aggregation , 2015, IEEE Transactions on Medical Imaging.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[19]  David A. Patterson,et al.  A domain-specific architecture for deep neural networks , 2018, Commun. ACM.

[20]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[21]  Hyeran Jeon,et al.  Graph processing on GPUs: Where are the bottlenecks? , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[22]  G. Miller Learning to Forget , 2004, Science.

[23]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Mark Watson Neural Network Library , 1996 .

[26]  Amar Phanishayee,et al.  TBD: Benchmarking and Analyzing Deep Neural Network Training , 2018, ArXiv.