Can PCM Benefit GPU? Reconciling Hybrid Memory Design with GPU Massive Parallelism for Energy Efficiency
暂无分享,去创建一个
Jeffrey S. Vetter | Weikuan Yu | Bo Wu | Xipeng Shen | Yizheng Jiao | Bin Wang | Dong Li
[1] Scott A. Mahlke,et al. Sponge: portable stream programming on graphics engines , 2011, ASPLOS XVI.
[2] Luiz André Barroso,et al. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.
[3] Tao Li,et al. Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[4] Jun Yang,et al. A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.
[5] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[6] Rachata Ausavarungnirun,et al. DynRBLA: A High-Performance and Energy-Efficient Row Buffer Locality-Aware Caching Policy for Hybrid Memories , 2011 .
[7] Ricardo Bianchini,et al. Page placement in hybrid memory systems , 2011, ICS '11.
[8] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.
[9] John Shalf,et al. The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..
[10] Onur Mutlu,et al. Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.
[11] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[12] Ken Kennedy,et al. Automatic Data Layout Using 0-1 Integer Programming , 1994, IFIP PACT.
[13] Luiz André Barroso,et al. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.
[14] Bruce Jacob,et al. DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.
[15] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[16] Chen Ding,et al. Locality phase prediction , 2004, ASPLOS XI.
[17] Alvin R. Lebeck,et al. Power aware page allocation , 2000, SIGP.
[18] Yuanyuan Zhou,et al. DMA-aware memory energy management , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[19] Ashok Kumar,et al. An 8-Core 64-Thread 64b Power-Efficient SPARC SoC , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[20] Vijayalakshmi Srinivasan,et al. Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.
[21] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[22] H. Howie Huang,et al. Energy-aware writes to non-volatile main memory , 2011, OPSR.
[23] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[24] Kang G. Shin,et al. Design and Implementation of Power-Aware Virtual Memory , 2003, USENIX ATC, General Track.
[25] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[26] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.
[27] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[28] Uday Bondhugula,et al. A compiler framework for optimization of affine loop nests for gpgpus , 2008, ICS '08.
[29] Timothy Johnson,et al. An 8-core, 64-thread, 64-bit power efficient sparc soc (niagara2) , 2007, ISPD '07.
[30] Parijat Dube,et al. Architectural design for next generation heterogeneous memory systems , 2010, 2010 IEEE International Memory Workshop.
[31] Jian Li,et al. Power-performance considerations of parallel computing on chip multiprocessors , 2005, TACO.