论文信息 - TCD-NPE: A Re-configurable and Efficient Neural Processing Engine, Powered by Novel Temporal-Carry-deferring MACs

TCD-NPE: A Re-configurable and Efficient Neural Processing Engine, Powered by Novel Temporal-Carry-deferring MACs

In this paper, we first propose the design of Temporal-Carry-deferring MAC (TCD-MAC) and illustrate how our proposed solution can gain significant energy and performance benefit when utilized to process a stream of input data. We then propose using the TCD-MAC to build a reconfigurable, high speed, and low power Neural Processing Engine (TCD-NPE). We, further, propose a novel scheduler that lists the sequence of needed processing events to process an MLP model in the least number of computational rounds in our proposed TCD-NPE. We illustrate that our proposed TCD-NPE significantly outperform similar neural processing solutions that use conventional MACs in terms of both energy consumption and execution time.

[1] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[2] Avesta Sasan,et al. Exploiting Energy-Accuracy Trade-off through Contextual Awareness in Multi-Stage Convolutional Neural Networks , 2019, 20th International Symposium on Quality Electronic Design (ISQED).

[3] Srihari Cadambi,et al. A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.

[4] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[5] Avesta Sasan,et al. Inquisitive Defect Cache: A Means of Combating Manufacturing Induced Process Variation , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[7] Shawki Areibi,et al. Deep Learning on FPGAs: Past, Present, and Future , 2016, ArXiv.

[8] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.

[9] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[10] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[11] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[12] Avesta Sasan,et al. ICNN: An iterative implementation of convolutional neural networks to enable energy and computational complexity aware dynamic approximation , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[13] Avesta Sasan,et al. Process Variation Aware SRAM/Cache for aggressive voltage-frequency scaling , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[14] Avesta Sasan,et al. Variation Trained Drowsy Cache (VTD-Cache): A History Trained Variation Aware Drowsy Cache for Fine Grain Voltage Scaling , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15] Avesta Sasan,et al. A fault tolerant cache architecture for sub 500mV operation: resizable data composer cache (RDC-cache) , 2009, CASES '09.

[16] Avesta Sasan,et al. ICNN: The Iterative Convolutional Neural Network , 2020, ACM Trans. Embed. Comput. Syst..

[17] Avesta Sasan,et al. History & Variation Trained Cache (HVT-Cache): A process variation aware and fine grain voltage scalable cache with active access history monitoring , 2012, Thirteenth International Symposium on Quality Electronic Design (ISQED).

[18] Vivienne Sze,et al. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[20] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[21] Berin Martini,et al. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[22] H. D. Block. The perceptron: a model for brain functioning. I , 1962 .

[23] Srihari Cadambi,et al. A Massively Parallel Coprocessor for Convolutional Neural Networks , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[24] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25] Henk Corporaal,et al. Memory-centric accelerator design for Convolutional Neural Networks , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[26] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[27] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[28] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[29] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30] Leibo Liu,et al. RNA: A reconfigurable architecture for hardware neural acceleration , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[31] Avesta Sasan,et al. NESTA: Hamming Weight Compression-Based Neural Proc. EngineAli Mirzaeian , 2019, 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC).