A Survey of Performance Modeling and Simulation Techniques for Accelerator-Based Computing
暂无分享,去创建一个
Alexander Mendiburu | José Miguel-Alonso | Unai Lopez-Novoa | J. Miguel-Alonso | A. Mendiburu | Unai Lopez-Novoa
[1] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[2] Wen-mei W. Hwu,et al. Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors , 2012, PPoPP '12.
[3] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.
[4] John R. Rice,et al. Solving elliptic problems using ELLPACK , 1985, Springer series in computational mathematics.
[5] Rezaur Rahman. Intel® Xeon Phi™ Coprocessor Architecture and Tools , 2013, Apress.
[6] Michael Klemm,et al. OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison , 2012, MARC@RWTH.
[7] Henk Corporaal,et al. The boat hull model: enabling performance prediction for parallel computing prior to code development , 2012, CF '12.
[8] Michael C. Doggett,et al. Auto-tuning interactive ray tracing using an analytical GPU architecture model , 2012, GPGPU-5.
[9] Yue Wang,et al. An Instruction-Level Energy Estimation and Optimization Methodology for GPU , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.
[10] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[11] Venkatram Vishwanath,et al. GROPHECY: GPU performance projection from CPU code skeletons , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[12] Majid Sarrafzadeh,et al. Energy-aware high performance computing with graphic processing units , 2008, CLUSTER 2008.
[13] David M. Brooks,et al. Energy characterization and instruction-level energy model of Intel's Xeon Phi processor , 2013, International Symposium on Low Power Electronics and Design (ISLPED).
[14] Jianliang Xu,et al. GPURoofline: A Model for Guiding Performance Optimizations on GPUs , 2012, Euro-Par.
[15] Satoshi Matsuoka,et al. Statistical power modeling of GPU kernels using performance counters , 2010, International Conference on Green Computing.
[16] Sudhakar Yalamanchili,et al. Modeling GPU-CPU workloads and systems , 2010, GPGPU-3.
[17] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[18] K. Srinathan,et al. A performance prediction model for the CUDA GPGPU platform , 2009, 2009 International Conference on High Performance Computing (HiPC).
[19] Emmett Kilgariff,et al. Fermi GF100 GPU Architecture , 2011, IEEE Micro.
[20] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[21] Alfonso Niño,et al. A Survey of Parallel Programming Models and Tools in the Multi and Many-core Era , 2022 .
[22] Murat Efe Guney,et al. On the limits of GPU acceleration , 2010 .
[23] Kevin Skadron,et al. BenchFriend: Correlating the performance of GPU benchmarks , 2014, Int. J. High Perform. Comput. Appl..
[24] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[25] Kevin Skadron,et al. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.
[26] David Defour,et al. Barra: A Parallel Functional Simulator for GPGPU , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[27] David R. Kaeli,et al. Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[28] Shuaiwen Song,et al. A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[29] Hiroaki Kobayashi,et al. A History-Based Performance Prediction Model with Profile Data Classification for Automatic Task Allocation in Heterogeneous Computing Systems , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.
[30] Sabela Ramos,et al. Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi , 2013, HPDC.
[31] Vitaly Zakharenko,et al. FusionSim: Characterizing the Performance Benefits of Fused CPU/GPU Systems , 2012 .
[32] Xiaohan Ma,et al. Statistical Power Consumption Analysis and Modeling for GPU-based Computing , 2011 .
[33] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .
[34] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[35] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[36] Ben H. H. Juurlink,et al. How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[37] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1987, TOPL.
[38] André Seznec,et al. Break down GPU execution time with an analytical method , 2012, RAPIDO '12.
[39] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[40] Nicolas Brunie,et al. Simultaneous branch and warp interweaving for sustained GPU performance , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[41] Richard W. Vuduc,et al. A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.
[42] Mohak Shah,et al. Evaluating Learning Algorithms: A Classification Perspective , 2011 .
[43] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[44] Rezaur Rahman,et al. Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers , 2013 .
[45] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[46] Xiaohan Ma,et al. Improving Energy Efficiency of GPU based General-Purpose Scientific Computing through Automated Selection of Near Optimal Configurations , 2011 .
[47] Tao Li,et al. Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[48] Scott B. Baden,et al. Redefining the Role of the CPU in the Era of CPU-GPU Integration , 2012, IEEE Micro.
[49] Michael F. P. O'Boyle,et al. A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.
[50] Juha Reunanen,et al. Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..
[51] Ray Jain,et al. The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.
[52] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.
[53] Yun Liang,et al. An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[54] Laxmi N. Bhuyan,et al. A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.
[55] Sudhakar Yalamanchili,et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[56] González García,et al. Modelo de estimación de rendimiento para arquitecturas paralelas heterogéneas , 2013 .
[57] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.