论文信息 - A Scalable Multi-TeraOPS Core for AI Training and Inference

A Scalable Multi-TeraOPS Core for AI Training and Inference

This letter presents a multi-TOPS AI accelerator core for deep learning training and inference. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a dataflow architecture to provide high throughput and an on-chip scratchpad hierarchy to meet the bandwidth demands of the compute units. A custom 16b floating point (fp16) representation with 1 sign bit, 6 exponent bits, and 9 mantissa bits has also been developed for high model accuracy in training and inference as well as 1b/2b (binary/ternary) integer for aggressive inference performance. At 1.5 GHz, the AI core prototype achieves 1.5 TFLOPS fp16, 12 TOPS ternary, or 24 TOPS binary peak performance in 14-nm CMOS.

[1] Swagath Venkataramani,et al. POSTER: Design Space Exploration for Performance Optimization of Deep Neural Networks on Shared Memory Accelerators , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2] Chester Liu,et al. A 1.40mm2 141mW 898GOPS sparse neuromorphic processor in 40nm CMOS , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[3] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[4] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.

[5] Leibo Liu,et al. A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications , 2017, 2017 Symposium on VLSI Circuits.

[6] Desoli Mr Giuseppe,et al. 14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems , 2017 .

[7] Tadahiro Kuroda,et al. BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS , 2017, 2017 Symposium on VLSI Circuits.