Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
暂无分享,去创建一个
Natalie D. Enright Jerger | Patrick Judd | Andreas Moshovos | Tor M. Aamodt | Jorge Albericio | Tayler H. Hetherington | Patrick Judd | J. Albericio | N. E. Jerger | Andreas Moshovos | Jorge Albericio
[1] Krste Asanovic,et al. Convergence and scalarization for data-parallel architectures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[2] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.
[3] James E. Smith,et al. Vector instruction set support for conditional operations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[4] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.
[5] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[6] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[7] Samira Manabi Khan,et al. Last-level cache deduplication , 2014, ICS '14.
[8] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[9] Ester M. Garzón,et al. Improving the Performance of the Sparse Matrix Vector Product with GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.
[10] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[11] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[12] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.
[13] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[14] Yan Zhang,et al. FPGA vs. GPU for sparse matrix vector multiply , 2009, 2009 International Conference on Field-Programmable Technology.
[15] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Qiang Chen,et al. Network In Network , 2013, ICLR.
[17] Dong Li,et al. DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[18] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[19] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.
[20] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[21] Tor M. Aamodt,et al. Thread block compaction for efficient SIMT control flow , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[22] Ron Sass,et al. A hardware-software co-design approach for implementing sparse matrix vector multiplication on FPGAs , 2014, Microprocess. Microsystems.
[23] André DeHon,et al. Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.
[24] V. Sze,et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.
[25] Viktor K. Prasanna,et al. Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.
[26] Michael Garland,et al. Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .
[27] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[28] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[29] Mark Horowitz,et al. Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.
[30] Natalie D. Enright Jerger,et al. Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets , 2015, ArXiv.
[31] Alain J. Martin,et al. ET 2 : a metric for time and energy efficiency of computation , 2002 .
[32] Nicholas D. Lane,et al. An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices , 2015, IoT-App@SenSys.
[33] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[34] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[35] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[37] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[38] Gregory J. Wolff,et al. Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.
[39] Youcef Saad,et al. A Basic Tool Kit for Sparse Matrix Computations , 1990 .
[40] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[41] Christoforos E. Kozyrakis,et al. Convolution engine , 2015, Commun. ACM.
[42] David Gregg,et al. FPGA Based Sparse Matrix Vector Multiplication using Commodity DRAM Memory , 2007, 2007 International Conference on Field Programmable Logic and Applications.
[43] Onur Mutlu,et al. Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).