CGPredict: Embedded GPU Performance Estimation from Single-Threaded Applications
暂无分享,去创建一个
Tulika Mitra | Siqi Wang | Guanwen Zhong | T. Mitra | G. Zhong | Siqi Wang
[1] Xiaojin Zhu,et al. Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[2] Yun Liang,et al. Design Space exploration of FPGA-based accelerators with multi-level parallelism , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[3] Yun Liang,et al. Instruction cache locking using temporal reuse profile , 2010, Design Automation Conference.
[4] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.
[5] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[6] Yun Liang,et al. Lin-Analyzer: A high-level performance analysis tool for FPGA-based accelerators , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[7] S. M. García,et al. 2014: , 2020, A Party for Lazarus.
[8] Jason Helge Anderson,et al. LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.
[9] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[10] TUNING CUDA APPLICATIONS FOR KEPLER , 2017 .
[11] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[12] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[13] Arun Parakh,et al. Performance Estimation of GPUs with Cache , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[14] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[15] Xinxin Mei,et al. Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[16] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[17] Venkatram Vishwanath,et al. GROPHECY: GPU performance projection from CPU code skeletons , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[18] Tao Tang,et al. Cache Miss Analysis for GPU Programs Based on Stack Distance Profile , 2011, 2011 31st International Conference on Distributed Computing Systems.
[19] Tom Feist,et al. Vivado Design Suite , 2012 .
[20] Derek Chiou,et al. GPGPU performance and power estimation using machine learning , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[21] J. Xu. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .