Mapping and virtual neuron assignment algorithms for MAERI accelerator

To date, some different deep learning accelerators (DLAs) have proposed to solve challenges caused by increasing deep neural networks’ layers. GPU-based systems almost faced energy efficiency problems due to the parallel computing operations and increasing memory accesses that led to memory capacity, bandwidth requirement, and delay challenges. DLAs-based systems tried to overcome the challenges and improve the parameters, which their flexibility remains a challenge. Some case studies investigated the proposed DLAs and demonstrated an impressive effect of different mapping methods on reducing energy consumption and delay. We analyze MAERI’s role in solving the issue and the impact of mapping methods to face the challenges induced by implementing different DNN trained models using the accelerators. This work proposes an algorithm for mapping and assigning virtual neurons (VNs) on the MAERI accelerator to improve its performance and cost. The simulation results demonstrate the reducing energy consumption and delay of approximately 21–92% and 14–21% caused by implementing AlexNet and VGG-16 on MAERI, respectively. The mapping method has a significant effect in increasing the proposed DLAs’ performance and reducing their cost without redesign their structures. The proposed VNs assignment approach helps to support different DNN trained models and increases DLA-based systems' flexibility.

[1]  Kun-Chih Jimmy Chen,et al.  Cycle-Accurate NoC-based Convolutional Neural Network Simulator , 2019, COINS.

[2]  Kun-Chih Chen,et al.  NoC-based DNN accelerator: a future design paradigm , 2019, NOCS.

[3]  Vivek Sarkar,et al.  Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach , 2018, MICRO.

[4]  Zhigang Mao,et al.  mRNA: Enabling Efficient Mapping Space Exploration for a Reconfiguration Neural Accelerator , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[5]  Salvatore Monteleone,et al.  DNNZip: Selective Layers Compression Technique in Deep Neural Network Accelerators , 2020, 2020 23rd Euromicro Conference on Digital System Design (DSD).

[6]  Karthikeyan Sankaralingam,et al.  A general constraint-centric scheduling framework for spatial architectures , 2013, PLDI.

[7]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[8]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[9]  Matthew Mattina,et al.  SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.

[10]  Dipankar Das,et al.  SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  John Jose,et al.  Improving Inference Latency and Energy of Network-on-Chip based Convolutional Neural Networks through Weights Compression , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[12]  Salvatore Monteleone,et al.  Cycle-Accurate Network on Chip Simulation with Noxim , 2016, ACM Trans. Model. Comput. Simul..

[13]  Hyoukjun Kwon,et al.  MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.

[14]  Karthikeyan Sankaralingam,et al.  Stream-dataflow acceleration , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[15]  Xiaowei Li,et al.  FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[16]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[17]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[18]  Qi Yu,et al.  DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Midia Reshadi,et al.  Flow mapping and data distribution on mesh-based deep learning accelerator , 2019, NOCS.

[20]  Kun-Chih Chen,et al.  A NoC-based simulator for design and evaluation of deep neural networks , 2020, Microprocess. Microsystems.

[21]  Midia Reshadi,et al.  Flow mapping on mesh-based deep learning accelerator , 2020, J. Parallel Distributed Comput..

[22]  Masoud Daneshtalab,et al.  Reconfigurable Network-on-Chip for 3D Neural Network Accelerators , 2018, 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS).

[23]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[24]  Hyoukjun Kwon,et al.  MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators , 2018, ArXiv.

[25]  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[26]  C. Kozyrakis,et al.  TETRIS , 2017 .

[27]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[28]  Hyoukjun Kwon,et al.  Rethinking NoCs for spatial neural network accelerators , 2017, 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS).

[29]  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[30]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[31]  Karthikeyan Sankaralingam,et al.  A general constraint-centric scheduling framework for spatial architectures , 2013, PLDI.

[32]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[33]  Luca Benini,et al.  Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes , 2018, 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[34]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.