Scalable deep neural network accelerator cores with cubic integration using through chip interface

Due to the recent advances in Deep Neural Network (DNN) technologies, recognition and inference applications are expected to run on mobile embedded systems. Developing high-performance and power-efficient DNN engines becomes one of the important challenges for embedded systems. Since DNN algorithms or structures are frequently updated, flexibility and performance scalability to deal with various types of networks are crucial requirement of the DNN accelerator design. In this paper, we describe the architecture and LSI design of a flexible and scalable CNN accelerator called SNACC (Scalable Neuro Accelerator Core with Cubic integration) which consists of several processing cores, on-chip memory modules, and ThruChip Interface (TCI).

[1]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[3]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.