DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs

The past few years have seen a surge of applying Deep Learning (DL) models for a wide array of tasks such as image classification, object detection, machine translation, etc. While DL models provide an opportunity to solve otherwise intractable tasks, their adoption relies on them being optimized to meet target latency and resource requirements. Benchmarking is a key step in this process but has been hampered in part due to the lack of representative and up-to-date benchmarking suites. This paper proposes DLBricks, a composable benchmark generation design that reduces the effort of developing, maintaining, and running DL benchmarks. DLBricks decomposes DL models into a set of unique runnable networks and constructs the original model's performance using the performance of the generated benchmarks. Since benchmarks are generated automatically and the benchmarking time is minimized, DLBricks can keep up-to-date with the latest proposed models, relieving the pressure of selecting representative DL models. We evaluate DLBricks using 50 MXNet models spanning 5 DL tasks on 4 representative CPU systems. We show that DLBricks provides an accurate performance estimate for the DL models and reduces the benchmarking time across systems (e.g. within 95% accuracy and up to 4.4× benchmarking time speedup on Amazon EC2 c5.xlarge).

[1]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[2]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[3]  Ramyad Hadidi,et al.  Characterizing the Execution of Deep Neural Networks on Collaborative Robots and Edge Devices , 2019, PEARC.

[4]  Amar Phanishayee,et al.  Benchmarking and Analyzing Deep Neural Network Training , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[5]  Arjun Sondhi,et al.  The Reduced PC-Algorithm: Improved Causal Structure Learning in Large Random Networks , 2018, J. Mach. Learn. Res..

[6]  David M. Brooks,et al.  Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kunle Olukotun,et al.  Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark , 2018, ACM SIGOPS Oper. Syst. Rev..

[9]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[10]  David A. Patterson,et al.  A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution , 2018, IEEE Micro.

[11]  Schahram Dustdar,et al.  Towards a Serverless Platform for Edge AI , 2019, HotEdge.

[12]  Parijat Dube,et al.  ModelOps: Cloud-Based Lifecycle Management for Reliable and Trusted AI , 2019, 2019 IEEE International Conference on Cloud Engineering (IC2E).

[13]  Jonathan Rose,et al.  Automatic generation of synthetic sequential benchmark circuits , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Jinjun Xiong,et al.  Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[16]  Gu-Yeon Wei,et al.  Fathom: reference workloads for modern deep learning methods , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[17]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[18]  Wei Wei,et al.  AI Matrix: A Deep Learning Benchmark for Alibaba Data Centers , 2019, ArXiv.

[19]  Carole-Jean Wu,et al.  Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[20]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  David R. Kaeli,et al.  Characterizing the Microarchitectural Implications of a Convolutional Neural Network (CNN) Execution on GPUs , 2018, ICPE.