Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM
暂无分享,去创建一个
Satoshi Matsuoka | Kenjiro Taura | Rio Yokota | Abdelhalim Amer | Naoya Maruyama | Miquel Pericàs | S. Matsuoka | N. Maruyama | M. Pericàs | A. Amer | K. Taura | Rio Yokota
[1] Samuel Williams,et al. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[2] Cédric Augonnet,et al. StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines , 2010 .
[3] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[4] Michael M. Resch,et al. Tools for High Performance Computing - Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing, July 2008, HLRS, Stuttgart , 2008, Parallel Tools Workshop.
[5] Lars Bergstrom,et al. Measuring NUMA effects with the STREAM benchmark , 2011, ArXiv.
[6] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[7] L. Greengard. The Rapid Evaluation of Potential Fields in Particle Systems , 1988 .
[8] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[9] Piet Hut,et al. A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.
[10] Lexing Ying,et al. A New Parallel Kernel-Independent Fast Multipole Method , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[11] Walter Dehnen,et al. A Hierarchical O(N) Force Calculation Algorithm , 2002 .
[12] Hatem Ltaief,et al. Data‐driven execution of fast multipole methods , 2012, Concurr. Comput. Pract. Exp..
[13] Rio Yokota,et al. Petascale turbulence simulation using a highly parallel fast multipole method on GPUs , 2011, Comput. Phys. Commun..
[14] Kenjiro Taura,et al. A Task Parallel Implementation of Fast Multipole Methods , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[15] Matsuoka Satoshi,et al. Towards a Dataflow FMM using the OmpSs Programming Model , 2012 .
[16] D. Zorin,et al. A kernel-independent adaptive fast multipole algorithm in two and three dimensions , 2004 .
[17] David Padua,et al. Encyclopedia of Parallel Computing , 2011 .
[18] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[19] Matthias S. Müller,et al. The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.
[20] Stéphanie Chaillat,et al. A multi-level fast multipole BEM for 3-D elastodynamics in the frequency domain , 2008 .
[21] Emmanuel Agullo,et al. Pipelining the Fast Multipole Method over a Runtime System , 2012, CSE 2012.
[22] Richard W. Vuduc,et al. Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[23] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .