Improving Energy Efficiency of GPUs through Data Compression and Compressed Execution
暂无分享,去创建一个
Won Woo Ro | Hyeran Jeon | Gunjae Koo | Sangpil Lee | Murali Annavaram | Keunsoo Kim | M. Annavaram | Hyeran Jeon | Gunjae Koo | Keunsoo Kim | Sangpil Lee | W. Ro
[1] Mark R. Nelson,et al. LZW data compression , 1989 .
[2] Peter Deutsch,et al. DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.
[3] Trevor N. Mudge,et al. Improving code density using compression techniques , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[4] James E. Smith,et al. The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[5] Jun Yang,et al. Frequent Value Locality and Value-Centric Data Cache Design , 2000, ASPLOS.
[6] Hubertus Franke,et al. Memory Expansion Technology (MXT): Software support and performance , 2001, IBM J. Res. Dev..
[7] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.
[8] David A. Wood,et al. Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[9] Mateo Valero,et al. A content aware integer register file organization , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[10] David A. Wood,et al. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .
[11] Hiroshi Nakamura,et al. A small, fast and low-power register file by bit-partitioning , 2005, 11th International Symposium on High-Performance Computer Architecture.
[12] Aviral Shrivastava,et al. Bypass aware instruction scheduling for register file power reduction , 2006 .
[13] Chita R. Das,et al. Performance and power optimization through data compression in Network-on-Chip architectures , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[14] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[15] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[16] Tom R. Halfhill. NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .
[17] Yao Zhang,et al. Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations , 2009, Euro-Par Workshops.
[18] Xi Chen,et al. C-Pack: A High-Performance Microprocessor Cache Compression Algorithm , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[19] Yunsi Fei,et al. Register file partitioning and recompilation for register file power reduction , 2010, TODE.
[20] Bevan M. Baas,et al. Design of an energy-efficient 32-bit adder operating at subthreshold voltages in 45-nm CMOS , 2010, International Conference on Communications and Electronics 2010.
[21] William J. Dally,et al. A compile-time managed multi-level register file hierarchy , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[22] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[23] Mark Horowitz,et al. Energy-Efficient Floating-Point Unit Design , 2011, IEEE Transactions on Computers.
[24] Chia-Lin Yang,et al. Power gating strategies on GPUs , 2011, TACO.
[25] William J. Dally,et al. Energy-efficient mechanisms for managing thread context in throughput processors , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[26] Sylvain Collange,et al. Affine Vector Cache for memory bandwidth savings , 2011 .
[27] Nam Sung Kim,et al. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[28] Onur Mutlu,et al. Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[29] Hyeran Jeon,et al. Warped-DMR: Light-weight Error Detection for GPGPU , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[30] Krste Asanovic,et al. Convergence and scalarization for data-parallel architectures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[31] Zhongliang Chen,et al. Characterizing scalar opportunities in GPGPU applications , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[32] Ben H. H. Juurlink,et al. How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[33] Mohammad Abdel-Majeed,et al. Warped register file: A power efficient register file for GPGPUs , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[34] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[35] Christopher Torng,et al. Microarchitectural mechanisms to exploit value structure in SIMT architectures , 2013, ISCA.
[36] Rong Ge,et al. Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU , 2013, 2013 42nd International Conference on Parallel Processing.
[37] Mohammad Abdel-Majeed,et al. Warped gates: Gating aware scheduling and power gating for GPGPUs , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[38] Hiroshi Nakamura,et al. Power capping of CPU-GPU heterogeneous systems through coordinating DVFS and task mapping , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).
[39] Nam Sung Kim,et al. Power-efficient computing for compute-intensive GPGPU applications , 2013, HPCA.
[40] Qunfeng Dong,et al. A Case for a Flexible Scalar Unit in SIMT Architecture , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[41] Waleed Dweik,et al. Warped-Shield: Tolerating Hard Faults in GPGPUs , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[42] Samira Manabi Khan,et al. Last-level cache deduplication , 2014, ICS '14.
[43] Somayeh Sardashti,et al. Decoupled Compressed Cache: Exploiting Spatial Locality for Energy Optimization , 2014, IEEE Micro.
[44] Murali Annavaram,et al. PATS: Pattern aware scheduling and power gating for GPGPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[45] Rajeev Balasubramonian,et al. MemZip: Exploring unconventional benefits from memory compression , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[46] Per Stenström,et al. SC2: A statistical compression cache scheme , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[47] Won Woo Ro,et al. Warped-Compression: Enabling power efficient GPUs through register compression , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[48] Luigi Carro,et al. Understanding GPU errors on large-scale HPC systems and the implications for system design and operation , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[49] Joel Emer,et al. SASSIFI : Evaluating Resilience of GPU Applications , 2015 .