Architecture and Compiler Support for GPUs Using Energy-Efficient Affine Register Files
暂无分享,去创建一个
[1] Fernando Magno Quintão Pereira,et al. Divergence analysis , 2013, ACM Trans. Program. Lang. Syst..
[2] Sylvain Collange,et al. Affine Vector Cache for memory bandwidth savings , 2011 .
[3] William J. Dally,et al. A compile-time managed multi-level register file hierarchy , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[5] Norman P. Jouppi,et al. CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.
[6] G. Edward Suh,et al. SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[7] Zhongliang Chen,et al. Characterizing scalar opportunities in GPGPU applications , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[8] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[9] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[10] Phil Rogers,et al. Heterogeneous system architecture overview , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).
[11] Minyi Guo,et al. An energy-efficient and scalable eDRAM-based register file architecture for GPGPU , 2013, ISCA.
[12] Yu Wang,et al. A STT-RAM-based low-power hybrid register file for GPGPUs , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[13] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.
[14] Christian Terboven,et al. OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.
[15] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[16] Yi-Ping You,et al. Compiler-Assisted Resource Management for CUDA Programs , 2011 .
[17] Nam Sung Kim,et al. Power-efficient computing for compute-intensive GPGPU applications , 2013, HPCA.
[18] Sudhakar Yalamanchili,et al. Power Modeling for GPU Architectures Using McPAT , 2014, TODE.
[19] John L. Hennessy,et al. The priority-based coloring approach to register allocation , 1990, TOPL.
[20] Yi Yang,et al. Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement , 2013, ICS '13.
[21] Zhongliang Chen,et al. Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[22] Mingsong Chen,et al. Exploring Soft-Error Robust and Energy-Efficient Register File in GPGPUs using Resistive Memory , 2016, TODE.
[23] William J. Dally,et al. Energy-efficient mechanisms for managing thread context in throughput processors , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[24] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[25] Mohammad Abdel-Majeed,et al. Warped register file: A power efficient register file for GPGPUs , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[26] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[27] Qunfeng Dong,et al. A Case for a Flexible Scalar Unit in SIMT Architecture , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[28] Christopher Torng,et al. Microarchitectural mechanisms to exploit value structure in SIMT architectures , 2013, ISCA.
[29] Yao Zhang,et al. Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations , 2009, Euro-Par Workshops.
[30] Krste Asanovic,et al. Convergence and scalarization for data-parallel architectures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).