CAMO: A novel cache management organization for GPGPUs
暂无分享,去创建一个
Laxmi N. Bhuyan | Madhu Mutyam | Manoranjan Satpathy | Debiprasanna Sahoo | Swaraj Sha | L. Bhuyan | M. Satpathy | M. Mutyam | Debiprasanna Sahoo | Swaraj Sha
[1] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[2] J. Thomas Pawlowski,et al. Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[3] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[4] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.
[5] Kevin M. Lepak,et al. Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor , 2010, IEEE Micro.
[6] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[7] 김장우,et al. A fully associative, tagless DRAM cache , 2015 .
[8] Ram Huggahalli,et al. Direct cache access for high bandwidth network I/O , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[9] Jaewon Lee,et al. GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[10] R. Manikantan,et al. Bi-Modal DRAM Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, MICRO 2014.
[11] Emmett Kilgariff,et al. Fermi GF100 GPU Architecture , 2011, IEEE Micro.
[12] Xin Bi,et al. High bandwidth memory interface design based on DDR3 SDRAM and FPGA , 2015, 2015 International SoC Design Conference (ISOCC).
[13] Gabriel H. Loh,et al. A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[14] Li Zhao,et al. Exploring DRAM cache architectures for CMP server platforms , 2007, 2007 25th International Conference on Computer Design.
[15] R. Govindarajan,et al. Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[16] Babak Falsafi,et al. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.
[17] Mark D. Hill,et al. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Tarek S. Abdelrahman,et al. hiCUDA: a high-level directive-based language for GPU programming , 2009, GPGPU-2.
[19] Cheng-Chieh Huang,et al. ATCache: Reducing DRAM cache latency via a small SRAM tag cache , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[20] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[21] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[22] Babak Falsafi,et al. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[23] Yan Solihin,et al. CHOP: Integrating DRAM Caches for CMP Server Platforms , 2011, IEEE Micro.
[24] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).