Accelerating CNN Algorithm with Fine-Grained Dataflow Architectures
暂无分享,去创建一个
Dongrui Fan | Meng Wu | Hao Zhang | Yujing Feng | Taoran Xiang | Xiaochun Ye | Wenming Li | Yatao Zhu | Xu Tan | Dongrui Fan | Xiaochun Ye | Xu Tan | Wenming Li | Hao Zhang | Yujing Feng | Yatao Zhu | Meng Wu | Taoran Xiang
[1] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[2] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[3] Zhimin Zhang,et al. An Efficient Network-on-Chip Router for Dataflow Architecture , 2017, Journal of Computer Science and Technology.
[4] Steven Swanson,et al. The WaveScalar architecture , 2007, TOCS.
[5] Lorenzo Verdoscia,et al. A matrix multiplier case study for an evaluation of a configurable dataflow-machine , 2015, Conf. Computing Frontiers.
[6] Doug Burger,et al. TRIPS: A distributed explicit data graph execution (EDGE) microprocessor , 2007, 2007 IEEE Hot Chips 19 Symposium (HCS).
[7] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[8] Jack B. Dennis,et al. First version of a data flow procedure language , 1974, Symposium on Programming.
[9] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[10] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[11] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[12] Patrice Y. Simard,et al. High Performance Convolutional Neural Networks for Document Processing , 2006 .
[13] Dongrui Fan,et al. SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[14] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[15] Dongrui Fan,et al. SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture , 2013, International Symposium on Low Power Electronics and Design (ISLPED).
[16] Zhimin Zhang,et al. A Non-Stop Double Buffering Mechanism for Dataflow Architecture , 2017, Journal of Computer Science and Technology.
[17] Dongrui Fan,et al. A Pipelining Loop Optimization Method for Dataflow Architecture , 2017, Journal of Computer Science and Technology.
[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[19] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[20] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.