论文信息 - Memory Bandwidth and Energy Efficiency Optimization of Deep Convolutional Neural Network Accelerators

Memory Bandwidth and Energy Efficiency Optimization of Deep Convolutional Neural Network Accelerators

Deep convolutional neural networks (DNNs) achieve state-of-the-art accuracy but at the cost of massive computation and memory operations. Although highly-parallel devices effectively meet the requirements of computation, energy efficiency is still a tough nut.

[1] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2] Geoffrey Zweig,et al. Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[4] Qi Yu,et al. DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5] Luca Maria Gambardella,et al. Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[6] Chao Wang,et al. CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10] Andreas Moshovos,et al. Bit-Pragmatic Deep Neural Network Computing , 2016, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11] Cédric Bastoul,et al. Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[12] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[13] Greg Brown,et al. A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications , 2012, FPGA '12.

[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[16] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[17] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[18] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[19] Wenyuan Lu,et al. Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine , 2017, 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC).

[20] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[21] Bruce Jacob,et al. DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[22] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[23] Yi Yang,et al. Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.