Exploring GPU acceleration of Deep Neural Networks using Block Circulant Matrices
暂无分享,去创建一个
David Kaeli | Pu Zhao | Shi Dong | Xue Lin
[1] Chunhua Deng,et al. PermDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[2] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[3] Tara N. Sainath,et al. Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.
[4] Andrew Zisserman,et al. Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.
[5] V. Pan,et al. Polynomial and matrix computations (vol. 1): fundamental algorithms , 1994 .
[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[7] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[8] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Wayne Eberly. Polynomial and Matrix Computations Volume 1: Fundamental Algorithms (Dario Bini and Victor Pan) , 1996, SIAM Rev..
[10] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[11] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[12] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[13] Xiaogang Wang,et al. Convolutional neural networks with low-rank regularization , 2015, ICLR.
[14] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[15] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[16] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .
[17] Qinru Qiu,et al. Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[18] Chao Wang,et al. CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] Frédéric Pétrot,et al. Ternary neural networks for resource-efficient AI applications , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).
[20] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[21] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[22] D. N. Kim,et al. Fast Fourier Transform - Algorithms and Applications , 2010 .
[23] Anthony Skjellum,et al. dMath: A Scalable Linear Algebra and Math Library for Heterogeneous GP-GPU Architectures , 2016, ArXiv.
[24] Dhabaleswar K. Panda,et al. S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters , 2017, PPoPP.
[25] Andrew S. Cassidy,et al. A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.
[26] Rajesh Gupta,et al. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.
[27] Gu-Yeon Wei,et al. Deep Learning for Computer Architects , 2017, Synthesis Lectures on Computer Architecture.
[28] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[29] David R. Kaeli,et al. DNNMark: A Deep Neural Network Benchmark Suite for GPUs , 2017, GPGPU@PPoPP.
[30] Andrew S. Cassidy,et al. Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.
[31] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[32] David R. Kaeli,et al. Characterizing the Microarchitectural Implications of a Convolutional Neural Network (CNN) Execution on GPUs , 2018, ICPE.
[33] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.
[34] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[35] Dharmendra S. Modha,et al. Backpropagation for Energy-Efficient Neuromorphic Computing , 2015, NIPS.
[36] V. Pan. Structured Matrices and Polynomials: Unified Superfast Algorithms , 2001 .
[37] Sachin S. Talathi,et al. Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.