Brain-inspired Co-design of Algorithm/Architecture for CNN Accelerators

The human brain-inspired design and analysis of Soft Tensor Processor (STP) for massively-parallel implementation of multi-layer Convolutional Neural Networks (CNN) which are currently the core of Deep Learning are presented. Unlike existing CNN accelerators, the proposed STP implements a required convolution by massively-parallel fine-grained execution of four-dimensional multiply-accumulate (MAC) operations with a systolic-like tensor data movement. This approach sufficiently reduces the number of needed time-steps to compute convolution. Under existing real-time constrains to implement a given application, such reduction of the time-steps can be used for lowering the working frequency in a physical implementation and, as a consequence, power consumption. An algorithm/architecture co-design of STP has been implemented under influence of the human brain as a system composed of the practically unlimited number of tightly interconnected low-frequency operational-and-storage elements (neurons) with a very low power consumption.

[1]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[2]  Jake K. Aggarwal,et al.  A Sliding Memory Plane Array Processor , 1993, IEEE Trans. Parallel Distributed Syst..

[3]  Yu Cao,et al.  Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[4]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[5]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[6]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7]  Yu Cao,et al.  Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.

[8]  Tomioka Yoichi,et al.  An FPGA Implementation of Deep Convolutional Neural Network using Synchronous Shift Data Transfer , 2015 .

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Kiyoung Choi,et al.  Efficient FPGA acceleration of Convolutional Neural Networks using logical-3D compute array , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  Soheil Ghiasi,et al.  Design space exploration of FPGA-based Deep Convolutional Neural Networks , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[12]  Vivek Sarkar,et al.  A Data-Centric Approach for Modeling and Estimating Efficiency of Dataflows for Accelerator Design , 2018 .

[13]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[14]  Yiran Chen,et al.  Reshaping Future Computing Systems With Emerging Nonvolatile Memory Technologies , 2019, IEEE Micro.

[15]  Michael Ferdman,et al.  Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Jennifer Hasler,et al.  Finding a roadmap to achieve large neuromorphic hardware systems , 2013, Front. Neurosci..

[18]  Yuan Taur,et al.  CMOS design near the limit of scaling , 2002 .

[19]  Subhasish Mitra,et al.  Three-dimensional integration of nanotechnologies for computing and data storage on a single chip , 2017, Nature.

[20]  Xuegong Zhou,et al.  A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.