Toward an Analytical Performance Model to Select between GPU and CPU Execution
暂无分享,去创建一个
Ettore Tiotto | Artem Chikin | Karim Ali | Jose Nelson Amaral | J. N. Amaral | Karim Ali | Ettore Tiotto | Artem Chikin
[1] Hesham El-Rewini,et al. Scheduling Parallel Program Tasks onto Arbitrary Target Machines , 1990, J. Parallel Distributed Comput..
[2] Marco Maggioni,et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.
[3] Scott F. Midkiff,et al. Heuristic Technique for Processor and Link Assignment in Multicomputers , 1991, IEEE Trans. Computers.
[4] Sally A. McKee,et al. Predicting parallel application performance via machine learning approaches , 2007, Concurr. Comput. Pract. Exp..
[5] Prasun Gera,et al. Performance Characterisation and Simulation of Intel's Integrated GPU Architecture , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[6] Gerhard Wellein,et al. Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures , 2018, 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).
[7] Nawwaf N. Kharma,et al. A high performance algorithm for static task scheduling in heterogeneous distributed computing systems , 2008, J. Parallel Distributed Comput..
[8] Michael F. P. O'Boyle,et al. Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.
[9] Michael E. Wolf,et al. Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.
[10] Millad Ghane,et al. False Sharing Detection in OpenMP Applications Using OMPT API , 2015, IWOMP.
[11] A. Snavely,et al. Modeling application performance by convolving machine signatures with application profiles , 2001, Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538).
[12] Satyajayant Misra,et al. A Scalable Analytical Memory Model for CPU Performance Prediction , 2017, PMBS@SC.
[13] Keshav Pingali,et al. Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[14] José Nelson Amaral,et al. Automated GPU Grid Geometry Selection for OPENMP Kernels , 2018, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[15] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[16] Wenguang Chen,et al. OpenUH: an optimizing, portable OpenMP compiler: Research Articles , 2007 .
[17] Barbara M. Chapman,et al. Invited Paper: A Compile-time Cost Model for OpenMP , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[18] Robert Dietrich,et al. OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis , 2013, IWOMP.
[19] Xingfu Wu,et al. Performance Modeling of Hybrid MPI/OpenMP Scientific Applications on Large-scale Multicore Cluster Systems , 2011, 2011 14th IEEE International Conference on Computational Science and Engineering.
[20] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[21] Manuel Prieto,et al. Fast finite difference Poisson solvers on heterogeneous architectures , 2014, Comput. Phys. Commun..
[22] Fiona Reid,et al. A Microbenchmark Suite for OpenMP Tasks , 2012, IWOMP.
[23] Salvatore Venticinque,et al. Performance prediction through simulation of a hybrid MPI/OpenMP application , 2005, Parallel Comput..
[24] Alan D. George,et al. FASE: A Framework for Scalable Performance Prediction of HPC Systems and Applications , 2007, Simul..
[25] Sally A. McKee,et al. Methods of inference and learning for performance modeling of parallel applications , 2007, PPoPP.
[26] Sven-Bodo Scholz,et al. Unibench: A Tool for Automated and Collaborative Benchmarking , 2010, 2010 IEEE 18th International Conference on Program Comprehension.
[27] P. Sadayappan,et al. Characterizing and enhancing global memory data coalescing on GPUs , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[28] Vivek K. Pallipuram,et al. Subjective versus objective: classifying analytical models for productive heterogeneous performance prediction , 2014, The Journal of Supercomputing.
[29] Pedro Valero-Lara,et al. Heterogeneous CPU+GPU approaches for mesh refinement over Lattice‐Boltzmann simulations , 2017, Concurr. Comput. Pract. Exp..
[30] Sally A. McKee,et al. An Approach to Performance Prediction for Parallel Applications , 2005, Euro-Par.