Energy-Efficient DNN Computing on GPUs Through Register File Management
暂无分享,去创建一个
[1] David Blaauw,et al. Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.
[2] Massoud Pedram,et al. Design and application of multimodal power gating structures , 2009, 2009 10th International Symposium on Quality Electronic Design.
[3] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.
[4] Xin Fu,et al. Soft-error reliability and power co-optimization for GPGPUs register file using resistive memory , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[5] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[6] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[7] Wei Zhang,et al. Drowsy Register Files for Reducing GPU Leakage Energy , 2017, 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS).
[8] Rui Peng,et al. Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures , 2016, ArXiv.
[9] Seth Copen Goldstein,et al. BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations , 2000, Euro-Par.
[10] Won Woo Ro,et al. Warped-Compression: Enabling power efficient GPUs through register compression , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[11] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[12] Mohammad Abdel-Majeed,et al. Warped register file: A power efficient register file for GPGPUs , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[13] Nam Sung Kim,et al. Power-efficient computing for compute-intensive GPGPU applications , 2013, HPCA.
[14] Wei Zhang,et al. GPU Register Packing: Dynamically Exploiting Narrow-Width Operands to Improve Performance , 2017, 2017 IEEE Trustcom/BigDataSE/ICESS.
[15] R. Venkatesh Babu,et al. Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.
[16] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[17] Shuicheng Yan,et al. Training Skinny Deep Neural Networks with Iterative Hard Thresholding Methods , 2016, ArXiv.
[18] Shuai Wang,et al. On the Exploitation of Narrow-Width Values for Improving Register File Reliability , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[19] Margaret Martonosi,et al. Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[20] Kanad Ghose,et al. Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[22] William J. Dally,et al. Stream register files with indexed access , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[23] Shuai Wang,et al. In-Register Duplication: Exploiting Narrow-Width Value for Improving Register File Reliability , 2006, International Conference on Dependable Systems and Networks (DSN'06).
[24] Hiroshi Nakamura,et al. A small, fast and low-power register file by bit-partitioning , 2005, 11th International Symposium on High-Performance Computer Architecture.