Timeloop: A Systematic Approach to DNN Accelerator Evaluation

This paper presents Timeloop, an infrastructure for evaluating and exploring the architecture design space of deep neural network (DNN) accelerators. Timeloop uses a concise and unified representation of the key architecture and implementation attributes of DNN accelerators to describe a broad space of hardware topologies. It can then emulate those topologies to generate an accurate projection of performance and energy efficiency for a DNN workload through a mapper that finds the best way to schedule operations and stage data on the specified architecture. This enables fair comparisons across different architectures and makes DNN accelerator design more systematic. This paper describes Timeloop's underlying models and algorithms in detail and shows results from case studies enabled by Timeloop, which provide interesting insights into the current state of DNN architecture design. In particular, they reveal that dataflow and memory hierarchy co-design plays a critical role in optimizing energy efficiency. Also, there is currently still not a single architecture that achieves the best performance and energy efficiency across a diverse set of workloads due to flexibility and efficiency trade-offs. These results provide inspiration into possible directions for DNN accelerator research.

[1]  Yu Cao,et al.  Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.

[2]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[3]  Jian-Bin Zhou,et al.  Energy-efficient scheduling method with cross-loop model for resource-limited CNN accelerator designs , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[4]  Michael Ferdman,et al.  Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[5]  Shoaib Kamil,et al.  The tensor algebra compiler , 2017, Proc. ACM Program. Lang..

[6]  Xiaowei Li,et al.  FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Joel Emer,et al.  A method to estimate the energy consumption of deep neural networks , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[8]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[9]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[10]  Shaoli Liu,et al.  Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Soheil Ghiasi,et al.  Design space exploration of FPGA-based Deep Convolutional Neural Networks , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[12]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[13]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[16]  Benoît Meister,et al.  Polyhedral Optimization of TensorFlow Computation Graphs , 2017, ESPT/VPA@SC.

[17]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[18]  Kiyoung Choi,et al.  Design space exploration of FPGA accelerators for convolutional neural networks , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[19]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[20]  Jason Clemons,et al.  Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration , 2019, ASPLOS.

[21]  Erich Elsen,et al.  Persistent RNNs: Stashing Recurrent Weights On-Chip , 2016, ICML.

[22]  Xin He,et al.  NNest: Early-Stage Design Space Exploration Tool for Neural Network Inference Accelerators , 2018, ISLPED.

[23]  Vivienne Sze,et al.  Using Dataflow to Optimize Energy Efficiency of Deep Neural Network Accelerators , 2017, IEEE Micro.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Asit K. Mishra,et al.  From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[29]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[30]  Hyoukjun Kwon,et al.  MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.

[31]  Tetsuya Asai,et al.  Exploring optimized accelerator design for binarized convolutional neural networks , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[32]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[33]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.