Data‐driven execution of fast multipole methods
暂无分享,去创建一个
[1] Jack J. Dongarra,et al. Scheduling dense linear algebra operations on multicore processors , 2010, Concurr. Comput. Pract. Exp..
[2] Jack J. Dongarra,et al. A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[3] Hari Sundar,et al. Bottom-Up Construction and 2: 1 Balance Refinement of Linear Octrees in Parallel , 2008, SIAM J. Sci. Comput..
[4] Jakub Kurzak,et al. Massively parallel implementation of a fast multipole method for distributed memory machines , 2005, J. Parallel Distributed Comput..
[5] James Reinders,et al. Intel® threading building blocks , 2008 .
[6] Chandrajit L. Bajaj,et al. An Efficient Higher-Order Fast Multipole Boundary Element Solution for Poisson-Boltzmann-Based Molecular Electrostatics , 2011, SIAM J. Sci. Comput..
[7] Thomas Hérault,et al. Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[8] John Dubinski. A parallel tree code , 1996 .
[9] Robert A. van de Geijn,et al. The libflame Library for Dense Matrix Computations , 2009, Computing in Science & Engineering.
[10] L. Greengard,et al. Regular Article: A Fast Adaptive Multipole Algorithm in Three Dimensions , 1999 .
[11] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .
[12] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[13] Philipp Birken,et al. Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.
[14] Robert A. van de Geijn,et al. Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures , 2007, SPAA '07.
[15] Michael S. Warren,et al. A parallel hashed oct-tree N-body algorithm , 1993, Supercomputing '93. Proceedings.
[16] Shang-Hua Teng,et al. Provably Good Partitioning and Load Balancing Algorithms for Parallel Adaptive N-Body Simulation , 1998, SIAM J. Sci. Comput..
[17] Rio Yokota,et al. Petascale turbulence simulation using a highly parallel fast multipole method on GPUs , 2011, Comput. Phys. Commun..
[18] Richard W. Vuduc,et al. Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[19] Lorena A. Barba,et al. Hierarchical N-body Simulations with Autotuning for Heterogeneous Systems , 2012, Computing in Science & Engineering.
[20] Michael A. Epton,et al. Multipole Translation Theory for the Three-Dimensional Laplace and Helmholtz Equations , 1995, SIAM J. Sci. Comput..
[21] Anoop Gupta,et al. Load Balancing and Data locality in Adaptive Hierarchical N-Body Methods: Barnes-Hut, Fast Multipole, and Rasiosity , 1995, J. Parallel Distributed Comput..
[22] B. Shanker,et al. A Novel Wideband FMM for Fast Integral Equation Solution of Multiscale Problems in Electromagnetics , 2009, IEEE Transactions on Antennas and Propagation.
[23] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[24] W. R. Sutherland,et al. The on-line graphical specification of computer procedures , 1966 .
[25] Richard W. Vuduc,et al. A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[26] Matthew G. Knepley,et al. PetFMM—A dynamically load‐balancing parallel fast multipole library , 2009, ArXiv.
[27] Qibai Huang,et al. A fast multipole boundary element method based on the improved Burton–Miller formulation for three-dimensional acoustic problems , 2011 .
[28] Emmanuel Agullo,et al. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[29] Michael S. Warren,et al. Skeletons from the treecode closet , 1994 .
[30] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[31] Wenbin Lin,et al. Volumetric fast multipole method for modeling Schrödinger's equation , 2007, J. Comput. Phys..
[32] Jack J. Dongarra,et al. Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[33] Anoop Gupta,et al. A parallel adaptive fast multipole method , 1993, Supercomputing '93. Proceedings.
[34] Qian Xi Wang,et al. Variable order revised binary treecode , 2004 .
[35] Stéphanie Chaillat,et al. A multi-level fast multipole BEM for 3-D elastodynamics in the frequency domain , 2008 .
[36] Michael S. Warren,et al. A portable parallel particle program , 1995 .
[37] Michael S. Warren,et al. Astrophysical N-body simulations using hierarchical tree data structures , 1992, Proceedings Supercomputing '92.
[38] Walter Dehnen,et al. A Hierarchical O(N) Force Calculation Algorithm , 2002 .