PDE solvers for hybrid CPU-GPU architectures
暂无分享,去创建一个
[1] Antoine Lejay,et al. Computing the principal eigenelements of some linear operators using a branching Monte Carlo method , 2008, J. Comput. Phys..
[2] R. Morgan. Computing Interior Eigenvalues of Large Matrices , 1991 .
[3] Alex Ramírez,et al. The low-power architecture approach towards exascale computing , 2011, ScalA '11.
[4] Stefan Turek,et al. GPU acceleration of an unmodified parallel finite element Navier-Stokes solver , 2009, 2009 International Conference on High Performance Computing & Simulation.
[5] K. Law. A parallel finite element solution method , 1986 .
[6] Rezaur Rahman,et al. Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers , 2013 .
[7] Charbel Farhat,et al. An Unconventional Domain Decomposition Method for an Efficient Parallel Solution of Large-Scale Finite Element Systems , 1992, SIAM J. Sci. Comput..
[8] Onkar Sahni,et al. Scalable implicit finite element solver for massively parallel processing with demonstration to 160K cores , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[9] Benjamin S. Kirk,et al. Library for Parallel Adaptive Mesh Refinement / Coarsening Simulations , 2006 .
[10] STEVE SCHAFFER,et al. A Semicoarsening Multigrid Method for Elliptic Partial Differential Equations with Highly Discontinuous and Anisotropic Coefficients , 1998, SIAM J. Sci. Comput..
[11] Jonathan Chang,et al. A 45 nm 8-Core Enterprise Xeon¯ Processor , 2009, IEEE Journal of Solid-State Circuits.
[12] Y. Saad,et al. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .
[13] W. Rüemelin. Numerical Treatment of Stochastic Differential Equations , 1982 .
[14] S. Ashby,et al. A parallel multigrid preconditioned conjugate gradient algorithm for groundwater flow simulations , 1996 .
[15] Robert D. Falgout,et al. Multigrid on massively parallel architectures , 2000 .
[16] Rafael Mayo,et al. Solving Dense Linear Systems on Graphics Processors , 2008, Euro-Par.
[17] Pheng-Ann Heng,et al. A hybrid condensed finite element model with GPU acceleration for interactive 3D soft tissue cutting , 2004, Comput. Animat. Virtual Worlds.
[18] Danny C. Sorensen,et al. Implicit Application of Polynomial Filters in a k-Step Arnoldi Method , 1992, SIAM J. Matrix Anal. Appl..
[19] Timothy C. Warburton,et al. Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..
[20] Karsten Schwan,et al. Efficient Wire Formats for High Performance Computing , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[21] S. Eisenstat,et al. Variational Iterative Methods for Nonsymmetric Systems of Linear Equations , 1983 .
[22] Nigel J. Newton. Asymptotically efficient Runge-Kutta methods for a class of ITOˆ and Stratonovich equations , 1991 .
[23] Marek Behr,et al. Parallel finite-element computation of 3D flows , 1993, Computer.
[24] Pradeep Dubey,et al. Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[25] M. Kac. On distributions of certain Wiener functionals , 1949 .
[26] Gordon Erlebacher,et al. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster , 2010, J. Comput. Phys..
[27] Hannes Vogt,et al. Coulomb, Landau and maximally Abelian gauge fixing in lattice QCD with multi-GPUs , 2012, Comput. Phys. Commun..
[28] Jun Zhou,et al. Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers , 2013, ICCS.
[29] Mircea Grigoriu,et al. Random walk method for the two‐ and three‐dimensional Laplace, Poisson and Helmholtz's equations , 2001 .
[30] Jonathan Ennis-King,et al. Effect of Vertical Heterogeneity on Long-Term Migration of CO2 in Saline Formations , 2010 .
[31] Chao-Tung Yang,et al. Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters , 2011, Comput. Phys. Commun..
[32] Roy H. Stogner,et al. Early Experiences Porting Scientific Applications to the Many Integrated Core ( MIC ) Platform , 2012 .
[33] Alejandro Duran,et al. The Intel® Many Integrated Core Architecture , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).
[34] Inanc Senocak,et al. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters , 2010 .
[35] J.,et al. EFFICIENT PRECONDITIONING FOR THE p-VERSION FINITE ELEMENT METHOD IN TWO DIMENSIONS , .
[36] Martin Kronbichler,et al. Algorithms and data structures for massively parallel generic adaptive finite element codes , 2011, ACM Trans. Math. Softw..
[37] Eric Darve,et al. Assembly of finite element methods on graphics processors , 2011 .
[38] Karol Miller,et al. Real-Time Nonlinear Finite Element Computations on GPU - Application to Neurosurgical Simulation. , 2010, Computer methods in applied mechanics and engineering.
[39] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[40] Manish Parashar,et al. Solving Sparse Linear Systems on NVIDIA Tesla GPUs , 2009, ICCS.
[41] Gene H. Golub,et al. Adaptively Preconditioned GMRES Algorithms , 1998, SIAM J. Sci. Comput..
[42] Jack J. Dongarra,et al. A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[43] Michal Mrozowski,et al. FINITE ELEMENT MATRIX GENERATION ON A GPU , 2012 .
[44] Andreas Rößler,et al. Runge-Kutta Methods for the Strong Approximation of Solutions of Stochastic Differential Equations , 2010, SIAM J. Numer. Anal..
[45] Marcus J. Grote,et al. Parallel Preconditioning with Sparse Approximate Inverses , 1997, SIAM J. Sci. Comput..
[46] Ronald B. Morgan,et al. A Restarted GMRES Method Augmented with Eigenvectors , 1995, SIAM J. Matrix Anal. Appl..
[47] Pradeep Dubey,et al. Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.
[48] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[49] V. S. Manoranjan,et al. A two-step Jacobi-type iterative method , 1997 .
[50] S. Tam,et al. A 65-nm Dual-Core Multithreaded Xeon® Processor With 16-MB L3 Cache , 2007, IEEE Journal of Solid-State Circuits.
[51] Andrew A. Chien,et al. The future of microprocessors , 2011, Commun. ACM.
[52] K. Burrage,et al. Restarted GMRES preconditioned by deflation , 1996 .
[53] G. Milstein. Numerical Integration of Stochastic Differential Equations , 1994 .
[54] Antoine Lejay,et al. Computing the principal eigenvalue of the Laplace operator by a stochastic method , 2007, Math. Comput. Simul..
[55] Mark A. Moraes,et al. Parallel random numbers: As easy as 1, 2, 3 , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[56] Jianbin Fang,et al. An Empirical Study of Intel Xeon Phi , 2013, ArXiv.
[57] Wei Chen,et al. A 22 nm 15-Core Enterprise Xeon® Processor Family , 2015, IEEE Journal of Solid-State Circuits.
[58] Pradeep Dubey,et al. Designing and dynamically load balancing hybrid LU for multi/many-core , 2011, Computer Science - Research and Development.
[59] Markus Clemens,et al. Scalability of Higher-Order Discontinuous Galerkin FEM Computations for Solving Electromagnetic Wave Propagation Problems on GPU Clusters , 2010, IEEE Transactions on Magnetics.
[60] Cornelis W. Oosterlee,et al. FOURIER ANALYSIS OF GMRES ( m ) PRECONDITIONED BY MULTIGRID , 2000 .
[61] Shiyi Chen,et al. LATTICE BOLTZMANN METHOD FOR FLUID FLOWS , 2001 .
[62] Georg Stadler,et al. Scalable adaptive mantle convection simulation on petascale supercomputers , 2008, HiPC 2008.
[63] Georg Stadler,et al. Towards adaptive mesh PDE simulations on petascale computers , 2008 .
[64] M. Embree. How Descriptive are GMRES Convergence Bounds? , 1999, ArXiv.
[65] Gordon Erlebacher,et al. Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA , 2009, J. Parallel Distributed Comput..
[66] Robert D. Falgout,et al. Semicoarsening Multigrid on Distributed Memory Machines , 1999, SIAM J. Sci. Comput..
[67] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .
[68] Andreas Rößler. Second Order Runge-Kutta Methods for Itô Stochastic Differential Equations , 2009, SIAM J. Numer. Anal..
[69] Jirí Jaros,et al. Multi-GPU island-based genetic algorithm for solving the knapsack problem , 2012, 2012 IEEE Congress on Evolutionary Computation.
[70] K. Burrage,et al. On the Performance of Various Adaptive Preconditioned GMRES Strategies , 1998 .
[71] Valeria Simoncini,et al. On the Convergence of Restarted Krylov Subspace Methods , 2000, SIAM J. Matrix Anal. Appl..
[72] Christopher Baker,et al. High performance radiation transport simulations: Preparing for TITAN , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[73] M. V. Tretyakov,et al. Stochastic Numerics for Mathematical Physics , 2004, Scientific Computation.
[74] P. Fischer,et al. Petascale algorithms for reactor hydrodynamics , 2008 .
[75] Peter Messmer,et al. Forward and adjoint simulations of seismic wave propagation on emerging large-scale GPU architectures , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[76] Cornelis Vuik,et al. A Comparison of Deflation and Coarse Grid Correction Applied to Porous Media Flow , 2004, SIAM J. Numer. Anal..
[77] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[78] Desmond J. Higham,et al. An Algorithmic Introduction to Numerical Simulation of Stochastic Differential Equations , 2001, SIAM Rev..
[79] Robert A. van de Geijn,et al. Level-3 BLAS on a GPU: Picking the low hanging fruit , 2012 .
[80] C. Schwab,et al. Boundary Element Methods , 2010 .
[81] Konstantinos I. Karantasis,et al. Acceleration of a Finite-Difference WENO Scheme for Large-Scale Simulations on Many-Core Architectures , 2010 .