Bifrost: End-to-End Evaluation and optimization of Reconfigurable DNN Accelerators

Reconfigurable accelerators for deep neural networks (DNNs) promise to improve performance such as inference latency. STONNE is the first cycle-accurate simulator for reconfigurable DNN inference accelerators which allows for the exploration of accelerator designs and configuration space. However, preparing models for evaluation and exploring configuration space in STONNE is a manual developer-time-consuming process, which is a barrier for research. This paper introduces Bifrost, an end-to-end framework for the evaluation and optimization of reconfigurable DNN inference accelerators. Bifrost operates as a frontend for STONNE and leverages the TVM deep learning compiler stack to parse models and automate offloading of accelerated computations. We discuss Bifrost’s advantages over STONNE and other tools, and evaluate the MAERI and SIGMA architectures using Bifrost. Additionally, Bifrost introduces a module leveraging AutoTVM to efficiently explore accelerator designs and datatlow mapping space to optimize performance. This is demonstrated by tuning the MAERI architecture and generating efficient datatlow mappings for AlexNet, obtaining an average speedup of $50\times$ for the convolutional layers and $11\times$ for the fully connected layers. Our code is available at www.github.com/gicLAB/bifrost.

[1]  José L. Abellán,et al.  A Novel Network Fabric for Efficient Spatio-Temporal Reduction in Flexible DNN Accelerators , 2021, 2021 15th IEEE/ACM International Symposium on Networks-on-Chip (NOCS).

[2]  Perry Gibson,et al.  SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference , 2021, 2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[3]  José L. Abellán,et al.  STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators , 2021, IEEE Computer Architecture Letters.

[4]  Niraj K. Jha,et al.  Software-Defined Design Space Exploration for an Efficient DNN Accelerator Architecture , 2019, IEEE Transactions on Computers.

[5]  Tushar Krishna,et al.  Data Orchestration in Deep Learning Accelerators , 2020, Synthesis Lectures on Computer Architecture.

[6]  Matthew Mattina,et al.  A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim , 2020, 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[7]  Cody Hao Yu,et al.  Ansor : Generating High-Performance Tensor Programs for Deep Learning , 2020, OSDI.

[8]  Tom B. Brown,et al.  Measuring the Algorithmic Efficiency of Neural Networks , 2020, ArXiv.

[9]  Jose Javier Gonzalez Ortiz,et al.  What is the State of Neural Network Pruning? , 2020, MLSys.

[10]  Dipankar Das,et al.  SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  Gu-Yeon Wei,et al.  SMAUG , 2019, ACM Trans. Archit. Code Optim..

[12]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[13]  Stanimire Tomov,et al.  MagmaDNN: Accelerated Deep Learning Using MAGMA , 2019, PEARC.

[14]  Zhigang Mao,et al.  mRNA: Enabling Efficient Mapping Space Exploration for a Reconfiguration Neural Accelerator , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[15]  Vivienne Sze,et al.  Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[16]  Matthew Mattina,et al.  SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.

[17]  Amos J. Storkey,et al.  Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[18]  Thierry Moreau,et al.  Graph Optimizer Tensor Optimizer VTA JIT Runtime VTA ISA VTA MicroArchitecture , 2018 .

[19]  Tianqi Chen,et al.  Relay: a new IR for machine learning frameworks , 2018, MAPL@PLDI.

[20]  Tianqi Chen,et al.  Optimizing Deep Learning Workloads on ARM GPU with TVM , 2018, ReQuEST@ASPLOS.

[21]  Thierry Moreau,et al.  Learning to Optimize Tensor Programs , 2018, NeurIPS.

[22]  Hyoukjun Kwon,et al.  MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.

[23]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[24]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[25]  V. Sze,et al.  Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.

[26]  Gu-Yeon Wei,et al.  Co-designing accelerators and SoC interfaces using gem5-Aladdin , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[29]  Yangqing Jia,et al.  Learning Semantic Image Representations at a Large Scale , 2014 .

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Lei Zhao,et al.  GATuner: Tuning Schema Matching Systems Using Genetic Algorithms , 2010, 2010 2nd International Workshop on Database Technology and Applications.