Modeling Irregular Kernels of Task-based codes: Illustration with the Fast Multipole Method
暂无分享,去创建一个
Emmanuel Agullo | Bérenger Bramas | Olivier Coulaud | Samuel Thibault | Luka Stanisic | Bérenger Bramas | O. Coulaud | Samuel Thibault | E. Agullo | Luka Stanisic
[1] Matthias Hauswirth,et al. Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.
[2] Scott B. Baden,et al. Performance Modeling Tools for Parallel Sparse Linear Algebra Computations , 2009, PARCO.
[3] Laxmikant V. Kalé,et al. BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[4] Jack J. Dongarra,et al. Guest Editors Introduction to the top 10 algorithms , 2000, Comput. Sci. Eng..
[5] Emmanuel Agullo,et al. Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers , 2015, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS).
[6] Henri Casanova,et al. Versatile, scalable, and accurate simulation of distributed applications and platforms , 2014, J. Parallel Distributed Comput..
[7] Bruce Jacob,et al. The structural simulation toolkit , 2006, PERV.
[8] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .
[9] Alejandro Duran,et al. Trace-driven simulation of multithreaded applications , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[10] Jack J. Dongarra,et al. Parallel Simulation of Superscalar Scheduling , 2014, 2014 43rd International Conference on Parallel Processing.
[11] Richard W. Vuduc,et al. A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method , 2014, GPGPU@ASPLOS.
[12] James Demmel,et al. SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems , 2003, TOMS.
[13] Lorena A. Barba,et al. How Will the Fast Multipole Method Fare in the Exascale Era , 2013 .
[14] Emmanuel Agullo,et al. Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method , 2016 .
[15] Dean M. Tullsen,et al. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.
[16] Hatem Ltaief,et al. Data‐driven execution of fast multipole methods , 2012, Concurr. Comput. Pract. Exp..
[17] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[18] Alfredo Buttari,et al. Fine-Grained Multithreading for the Multifrontal QR Factorization of Sparse Matrices , 2013, SIAM J. Sci. Comput..
[19] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .
[20] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[21] Jesús Labarta,et al. A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[22] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[23] Shunfei Chen,et al. MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).
[24] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.