PermDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices
暂无分享,去创建一个
Chunhua Deng | Keshab K. Parhi | Yi Xie | Xuehai Qian | Bo Yuan | Siyu Liao | K. Parhi | Siyu Liao | Xuehai Qian | Bo Yuan | Yi Xie | Chunhua Deng | Xuehai Qian
[1] Yuan Gao,et al. RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[2] Eriko Nurvitadhi,et al. A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study , 2018, FPGA.
[3] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[5] R. Srikant,et al. Why Deep Neural Networks for Function Approximation? , 2016, ICLR.
[6] Hadi Esmaeilzadeh,et al. TABLA: A unified template-based framework for accelerating statistical machine learning , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[7] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[8] Yiran Chen,et al. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[9] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[10] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[11] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[12] Stephen W. Keckler,et al. Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[13] V. Pan. Structured Matrices and Polynomials: Unified Superfast Algorithms , 2001 .
[14] Hassan Foroosh,et al. Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[16] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[17] Jing Wang,et al. In-Situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[18] Michael Ferdman,et al. Overcoming resource underutilization in spatial CNN accelerators , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[19] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Gu-Yeon Wei,et al. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[21] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[22] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.
[23] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[24] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[25] Andreas Moshovos,et al. Bit-Pragmatic Deep Neural Network Computing , 2016, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[26] Chao Wang,et al. CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[28] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[29] Sachin S. Talathi,et al. Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.
[30] Tao Li,et al. Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-Based Deep Learning , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[31] Eriko Nurvitadhi,et al. High performance binary neural networks on the Xeon+FPGA™ platform , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[32] Dong Han,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[33] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[34] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.
[35] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[36] Engin Ipek,et al. Making Memristive Neural Network Accelerators Reliable , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[37] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.
[38] Rajesh Gupta,et al. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.
[39] Engin Ipek,et al. Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning , 2017 .
[40] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[41] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[42] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[43] Christopher D. Manning,et al. Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.
[44] Marian Verhelst,et al. An always-on 3.8μJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).
[45] Natalia Gimelshein,et al. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[46] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[48] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[49] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[50] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.
[51] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[52] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[53] Scott A. Mahlke,et al. DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[54] Yuan Xie,et al. DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[55] Michael Ferdman,et al. Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[56] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[57] Qinru Qiu,et al. SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing , 2016, ASPLOS.
[58] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[59] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[60] Scott A. Mahlke,et al. Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[61] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[62] Pradeep Dubey,et al. SCALEDEEP: A scalable compute architecture for learning and evaluating deep networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[63] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Shih-Fu Chang,et al. An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[65] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[66] Andrew Zisserman,et al. Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.
[67] Kunle Olukotun,et al. Understanding and optimizing asynchronous low-precision stochastic gradient descent , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[68] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.
[69] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[70] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[71] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[72] Xuehai Zhou,et al. PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.
[73] Jungwon Lee,et al. Universal Deep Neural Network Compression , 2018, IEEE Journal of Selected Topics in Signal Processing.
[74] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).