论文信息 - Brain-inspired Co-design of Algorithm/Architecture for CNN Accelerators

Brain-inspired Co-design of Algorithm/Architecture for CNN Accelerators

The human brain-inspired design and analysis of Soft Tensor Processor (STP) for massively-parallel implementation of multi-layer Convolutional Neural Networks (CNN) which are currently the core of Deep Learning are presented. Unlike existing CNN accelerators, the proposed STP implements a required convolution by massively-parallel fine-grained execution of four-dimensional multiply-accumulate (MAC) operations with a systolic-like tensor data movement. This approach sufficiently reduces the number of needed time-steps to compute convolution. Under existing real-time constrains to implement a given application, such reduction of the time-steps can be used for lowering the working frequency in a physical implementation and, as a consequence, power consumption. An algorithm/architecture co-design of STP has been implemented under influence of the human brain as a system composed of the practically unlimited number of tightly interconnected low-frequency operational-and-storage elements (neurons) with a very low power consumption.

[1] David F. Bacon,et al. Compiler transformations for high-performance computing , 1994, CSUR.

[2] Jake K. Aggarwal,et al. A Sliding Memory Plane Array Processor , 1993, IEEE Trans. Parallel Distributed Syst..

[3] Yu Cao,et al. Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[4] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[5] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[6] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7] Yu Cao,et al. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.

[8] Tomioka Yoichi,et al. An FPGA Implementation of Deep Convolutional Neural Network using Synchronous Shift Data Transfer , 2015 .

[9] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[10] Kiyoung Choi,et al. Efficient FPGA acceleration of Convolutional Neural Networks using logical-3D compute array , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11] Soheil Ghiasi,et al. Design space exploration of FPGA-based Deep Convolutional Neural Networks , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[12] Vivek Sarkar,et al. A Data-Centric Approach for Modeling and Estimating Efficiency of Dataflows for Accelerator Design , 2018 .

[13] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[14] Yiran Chen,et al. Reshaping Future Computing Systems With Emerging Nonvolatile Memory Technologies , 2019, IEEE Micro.

[15] Michael Ferdman,et al. Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[16] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17] Jennifer Hasler,et al. Finding a roadmap to achieve large neuromorphic hardware systems , 2013, Front. Neurosci..

[18] Yuan Taur,et al. CMOS design near the limit of scaling , 2002 .

[19] Subhasish Mitra,et al. Three-dimensional integration of nanotechnologies for computing and data storage on a single chip , 2017, Nature.

[20] Xuegong Zhou,et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.