Finite Element Computations on Multicore and Graphics Processors

In this thesis, techniques for efficient utilization of modern computer hardwarefor numerical simulation are considered. In particular, we study techniques for improving the performance of computat ...

[1]  Wolfgang Paul,et al.  GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model , 2009, J. Comput. Phys..

[2]  Gordon Erlebacher,et al.  Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA , 2009, J. Parallel Distributed Comput..

[3]  Rüdiger Westermann,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.

[4]  D. Komatitsch,et al.  Introduction to the spectral element method for three-dimensional seismic wave propagation , 1999 .

[5]  Mark Moir,et al.  Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.

[6]  Yair Shapira Matrix-Based Multigrid: Theory and Applications , 2008 .

[7]  D. Brandt,et al.  Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .

[8]  W. Bangerth,et al.  deal.II—A general-purpose object-oriented finite element library , 2007, TOMS.

[9]  Dimitri Komatitsch,et al.  Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards , 2010 .

[10]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[11]  David K. McAllister,et al.  Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[12]  Jens Markus Melenk,et al.  Fully discrete hp-finite elements: fast quadrature , 2001 .

[13]  Michael J. Aftosmis,et al.  Parallel Multigrid on Cartesian Meshes with Complex Geometry , 2001 .

[14]  David A. Wood,et al.  Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.

[15]  Robert Strzodka,et al.  Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations , 2007, Int. J. Parallel Emergent Distributed Syst..

[16]  Bradley C. Kuszmaul,et al.  Unbounded transactional memory , 2005, 11th International Symposium on High-Performance Computer Architecture.

[17]  S. Orszag Spectral methods for problems in complex geometries , 1980 .

[18]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[19]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[20]  Jonathan J. Hu,et al.  Parallel multigrid smoothing: polynomial versus Gauss--Seidel , 2003 .

[21]  Eric Darve,et al.  Large calculation of the flow over a hypersonic vehicle using a GPU , 2008, J. Comput. Phys..

[22]  Quinn Jacobson,et al.  Architectural Support for Software Transactional Memory , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[23]  Katharina Kormann,et al.  A generic interface for parallel cell-based finite element operator application , 2012 .

[24]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[25]  Nicholas Wilt,et al.  The CUDA Handbook: A Comprehensive Guide to GPU Programming , 2013 .

[26]  Firas Hamze,et al.  A Performance Comparison of CUDA and OpenCL , 2010, ArXiv.

[27]  S. Sherwin,et al.  From h to p efficiently: Strategy selection for operator evaluation on hexahedral and tetrahedral elements , 2011 .

[28]  G. Carey,et al.  Element‐by‐element linear and nonlinear solution schemes , 1986 .

[29]  Eitan Grinspun,et al.  Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.

[30]  Charbel Farhat,et al.  A general approach to nonlinear FE computations on shared-memory multiprocessors , 1989 .

[31]  Matthew G. Knepley,et al.  Finite Element Integration on GPUs , 2013, TOMS.

[32]  Martin Kronbichler,et al.  Algorithms and data structures for massively parallel generic adaptive finite element codes , 2011, ACM Trans. Math. Softw..

[33]  Martin Tillenius,et al.  SuperGlue: A Shared Memory Framework Using Data Versioning for Dependency-Aware Task-Based Parallelization , 2015, SIAM J. Sci. Comput..

[34]  Gordon Erlebacher,et al.  High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster , 2010, J. Comput. Phys..

[35]  Eric Darve,et al.  Assembly of finite element methods on graphics processors , 2011 .

[36]  Michal Mrozowski,et al.  FINITE ELEMENT MATRIX GENERATION ON A GPU , 2012 .

[37]  David A. Ham,et al.  Finite element assembly strategies on multi‐core and many‐core architectures , 2013 .

[38]  Christopher J. Hughes,et al.  Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[39]  Rajiv K. Kalia,et al.  Performance Characteristics of Hardware Transactional Memory for Molecular Dynamics Application on BlueGene/Q: Toward Efficient Multithreading Strategies for Large-Scale Scientific Applications , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[40]  Joshua A. Anderson,et al.  General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..

[41]  Maryam Mehri Dehnavi,et al.  Finite-Element Sparse Matrix Vector Multiplication on Graphic Processing Units , 2010, IEEE Transactions on Magnetics.

[42]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[43]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[44]  Michael Gschwind,et al.  The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.

[45]  Victor Luchangco,et al.  Anatomy of a Scalable Software Transactional Memory , 2009 .

[46]  Jianbin Fang,et al.  A Comprehensive Performance Comparison of CUDA and OpenCL , 2011, 2011 International Conference on Parallel Processing.

[47]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[48]  Marc Tremblay,et al.  Rock: A High-Performance Sparc CMT Processor , 2009, IEEE Micro.

[49]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[50]  Robert Strzodka,et al.  Accelerating Double Precision FEM Simulations with GPUs , 2011 .

[51]  Thomas Y. Hou,et al.  A Multiscale Finite Element Method for Elliptic Problems in Composite Materials and Porous Media , 1997 .

[52]  Michal Mrozowski,et al.  A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU , 2011 .

[53]  Robert H. Dennard,et al.  Design of ion-implanted MOSFET's with very small physical dimensions , 2007 .

[54]  P. Brouaye,et al.  A mesh coloring method for efficient MIMD processing in finite element problems , 1982, ICPP.

[55]  David E. Keyes,et al.  Towards Realistic Performance Bounds for Implicit CFD Codes , 2000 .

[56]  Daniel J. Arrigo,et al.  An Introduction to Partial Differential Equations , 2017, An Introduction to Partial Differential Equations.

[57]  장훈,et al.  [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[58]  Martin Kronbichler,et al.  WorkStream -- A Design Pattern for Multicore-Enabled Finite Element Computations , 2016, ACM Trans. Math. Softw..

[59]  Timothy C. Warburton,et al.  Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..

[60]  D. Keyes,et al.  Jacobian-free Newton-Krylov methods: a survey of approaches and applications , 2004 .