论文信息 - BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W

BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W

A versatile reconfigurable accelerator architecture for binary/ternary deep neural networks is presented. In-memory neural network processing without any external data accesses, sustained by the symmetry and simplicity of the computation of the binary/ternaty neural network, improves the energy efficiency dramatically. The prototype chip is fabricated, and it achieves 1.4 TOPS (tera operations per second) peak performance with 0.6-W power consumption at 400-MHz clock. The application examination is also conducted.

[1] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[2] Tadahiro Kuroda,et al. BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS , 2017, 2017 Symposium on VLSI Circuits.

[3] Igor Carron,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[4] Marian Verhelst,et al. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[5] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[6] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[7] James T. Kwok,et al. Loss-aware Binarization of Deep Networks , 2016, ICLR.

[8] Dharmendra S. Modha,et al. A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[9] Bin Liu,et al. Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10] Marian Verhelst,et al. A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[11] Tetsuya Asai,et al. In-memory area-efficient signal streaming processor design for binary neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[12] Andrew S. Cassidy,et al. A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[13] Yoshua Bengio,et al. Training deep neural networks with low precision multiplications , 2014 .

[14] Hoi-Jun Yoo,et al. 14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[15] Jun-Seok Park,et al. 14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[16] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.

[17] Eriko Nurvitadhi,et al. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[18] Hiroki Nakahara,et al. On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[19] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .