An efficient compiler framework for cache bypassing on GPUs
暂无分享,去创建一个
Yun Liang | Deming Chen | Xiaolong Xie | Guangyu Sun | Guangyu Sun | Xiaolong Xie | Yun Liang | Deming Chen
[1] Arnold L. Rosenberg,et al. Using the compiler to improve cache replacement decisions , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[2] Margaret Martonosi,et al. Characterizing and improving the use of demand-fetched caches in GPUs , 2012, ICS '12.
[3] Carole-Jean Wu,et al. SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Jaehyuk Huh,et al. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[5] Jing-Yang Jou,et al. Cache Capacity Aware Thread Scheduling for Irregular Memory Access on many-core GPGPUs , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).
[6] Yan Solihin,et al. Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.
[7] Yu Wang,et al. Run-time technique for simultaneous aging and power optimization in GPGPUs , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).
[8] Luca Benini,et al. An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..
[9] Yu Wang,et al. Coordinated static and dynamic cache bypassing for GPUs , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[10] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[11] Xuhao Chen,et al. Adaptive Cache Management for Energy-Efficient GPU Computing , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[12] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[13] Margaret Martonosi,et al. MRPB: Memory request prioritization for massively parallel processors , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[14] Tor M. Aamodt,et al. Modeling Cache Contention and Throughput of Multiprogrammed Manycore Processors , 2012, IEEE Transactions on Computers.
[15] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[16] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[17] John D. Owens,et al. GPU Computing , 2008, Proceedings of the IEEE.
[18] William J. Dally,et al. Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[19] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[20] Yun Liang,et al. Static analysis for fast and accurate design space exploration of caches , 2008, CODES+ISSS '08.
[21] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[22] Uday Bondhugula,et al. A compiler framework for optimization of affine loop nests for gpgpus , 2008, ICS '08.
[23] Yun Liang,et al. Efficient GPU Spatial-Temporal Multitasking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[24] Yun Liang,et al. An efficient compiler framework for cache bypassing on GPUs , 2013, ICCAD 2013.
[25] Rajeev Barua,et al. Dynamic allocation for scratch-pad memory using compile-time decisions , 2006, TECS.
[26] Yooseong Kim,et al. CuMAPz: A tool to analyze memory access patterns in CUDA , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).
[27] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[28] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[29] Yun Liang,et al. An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[30] Yun Liang,et al. Register and thread structure optimization for GPUs , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).
[31] Thomas H. Cormen,et al. Introduction to algorithms [2nd ed.] , 2001 .
[32] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[33] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[34] Chyi-Chang Miao,et al. Compiler managed micro-cache bypassing for high performance EPIC processors , 2002, MICRO.
[35] Hiren D. Patel,et al. On the use of GP-GPUs for accelerating compute-intensive EDA applications , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[36] Shengkui Zhao,et al. Real-time implementation and performance optimization of 3D sound localization on GPUs , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[37] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.