论文信息 - A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications

A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications

Hybrid neural networks (hybrid-NNs) have been widely used and brought new challenges to NN processors. Thinker is an energy efficient reconfigurable hybrid-NN processor fabricated in 65-nm technology. To achieve high energy efficiency, three optimization techniques are proposed. First, each processing element (PE) supports bit-width adaptive computing to meet various bit-widths of neural layers, which raises computing throughput by 91% and improves energy efficiency by <inline-formula> <tex-math notation="LaTeX">$1.93 \times $ </tex-math></inline-formula> on average. Second, PE array supports on-demand array partitioning and reconfiguration for processing different NNs in parallel, which results in 13.7% improvement of PE utilization and improves energy efficiency by <inline-formula> <tex-math notation="LaTeX">$1.11 \times $ </tex-math></inline-formula>. Third, a fused data pattern-based multi-bank memory system is designed to exploit data reuse and guarantee parallel data access, which improves computing throughput and energy efficiency by <inline-formula> <tex-math notation="LaTeX">$1.11 \times $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$1.17 \times $ </tex-math></inline-formula>, respectively. Measurement results show that this processor achieves 5.09-TOPS/W energy efficiency at most.

[1] Xiaowei Li,et al. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[2] Hoi-Jun Yoo,et al. 14.1 A 126.1mW real-time natural UI/UX processor with embedded deep-learning core for low-power smart glasses , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5] Lukasz Kaiser,et al. One Model To Learn Them All , 2017, ArXiv.

[6] Marian Verhelst,et al. A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[7] Xiangyang Ji,et al. Action Recognition with Joint Attention on Multi-Level Deep Features , 2016, ArXiv.

[8] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[9] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[11] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[12] Lin Yang,et al. Combining Fully Convolutional and Recurrent Neural Networks for 3D Biomedical Image Segmentation , 2016, NIPS.

[13] Nitin Chawla,et al. 14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[14] Song Han,et al. ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA , 2016, ArXiv.

[15] Jun-Seok Park,et al. 14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[16] Leibo Liu,et al. A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications , 2017, 2017 Symposium on VLSI Circuits.

[17] Trevor Mudge,et al. 1 A 2 . 9 TOPS / W Deep Convolutional Neural Network SoC in FD-SOI 28 nm for Intelligent Embedded Systems , 2017 .

[18] Marian Verhelst,et al. Energy-efficient ConvNets through approximate computing , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[21] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[23] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[24] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Hoi-Jun Yoo,et al. 14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[26] Marian Verhelst,et al. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).