Finite Element Computations on Multicore and Graphics Processors
暂无分享,去创建一个
[1] Wolfgang Paul,et al. GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model , 2009, J. Comput. Phys..
[2] Gordon Erlebacher,et al. Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA , 2009, J. Parallel Distributed Comput..
[3] Rüdiger Westermann,et al. Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.
[4] D. Komatitsch,et al. Introduction to the spectral element method for three-dimensional seismic wave propagation , 1999 .
[5] Mark Moir,et al. Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.
[6] Yair Shapira. Matrix-Based Multigrid: Theory and Applications , 2008 .
[7] D. Brandt,et al. Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .
[8] W. Bangerth,et al. deal.II—A general-purpose object-oriented finite element library , 2007, TOMS.
[9] Dimitri Komatitsch,et al. Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards , 2010 .
[10] No License,et al. Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .
[11] David K. McAllister,et al. Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[12] Jens Markus Melenk,et al. Fully discrete hp-finite elements: fast quadrature , 2001 .
[13] Michael J. Aftosmis,et al. Parallel Multigrid on Cartesian Meshes with Complex Geometry , 2001 .
[14] David A. Wood,et al. Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.
[15] Robert Strzodka,et al. Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations , 2007, Int. J. Parallel Emergent Distributed Syst..
[16] Bradley C. Kuszmaul,et al. Unbounded transactional memory , 2005, 11th International Symposium on High-Performance Computer Architecture.
[17] S. Orszag. Spectral methods for problems in complex geometries , 1980 .
[18] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[19] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[20] Jonathan J. Hu,et al. Parallel multigrid smoothing: polynomial versus Gauss--Seidel , 2003 .
[21] Eric Darve,et al. Large calculation of the flow over a hypersonic vehicle using a GPU , 2008, J. Comput. Phys..
[22] Quinn Jacobson,et al. Architectural Support for Software Transactional Memory , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[23] Katharina Kormann,et al. A generic interface for parallel cell-based finite element operator application , 2012 .
[24] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[25] Nicholas Wilt,et al. The CUDA Handbook: A Comprehensive Guide to GPU Programming , 2013 .
[26] Firas Hamze,et al. A Performance Comparison of CUDA and OpenCL , 2010, ArXiv.
[27] S. Sherwin,et al. From h to p efficiently: Strategy selection for operator evaluation on hexahedral and tetrahedral elements , 2011 .
[28] G. Carey,et al. Element‐by‐element linear and nonlinear solution schemes , 1986 .
[29] Eitan Grinspun,et al. Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.
[30] Charbel Farhat,et al. A general approach to nonlinear FE computations on shared-memory multiprocessors , 1989 .
[31] Matthew G. Knepley,et al. Finite Element Integration on GPUs , 2013, TOMS.
[32] Martin Kronbichler,et al. Algorithms and data structures for massively parallel generic adaptive finite element codes , 2011, ACM Trans. Math. Softw..
[33] Martin Tillenius,et al. SuperGlue: A Shared Memory Framework Using Data Versioning for Dependency-Aware Task-Based Parallelization , 2015, SIAM J. Sci. Comput..
[34] Gordon Erlebacher,et al. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster , 2010, J. Comput. Phys..
[35] Eric Darve,et al. Assembly of finite element methods on graphics processors , 2011 .
[36] Michal Mrozowski,et al. FINITE ELEMENT MATRIX GENERATION ON A GPU , 2012 .
[37] David A. Ham,et al. Finite element assembly strategies on multi‐core and many‐core architectures , 2013 .
[38] Christopher J. Hughes,et al. Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[39] Rajiv K. Kalia,et al. Performance Characteristics of Hardware Transactional Memory for Molecular Dynamics Application on BlueGene/Q: Toward Efficient Multithreading Strategies for Large-Scale Scientific Applications , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[40] Joshua A. Anderson,et al. General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..
[41] Maryam Mehri Dehnavi,et al. Finite-Element Sparse Matrix Vector Multiplication on Graphic Processing Units , 2010, IEEE Transactions on Magnetics.
[42] Giorgio Valle,et al. CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.
[43] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[44] Michael Gschwind,et al. The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.
[45] Victor Luchangco,et al. Anatomy of a Scalable Software Transactional Memory , 2009 .
[46] Jianbin Fang,et al. A Comprehensive Performance Comparison of CUDA and OpenCL , 2011, 2011 International Conference on Parallel Processing.
[47] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[48] Marc Tremblay,et al. Rock: A High-Performance Sparc CMT Processor , 2009, IEEE Micro.
[49] Maurice Herlihy,et al. Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[50] Robert Strzodka,et al. Accelerating Double Precision FEM Simulations with GPUs , 2011 .
[51] Thomas Y. Hou,et al. A Multiscale Finite Element Method for Elliptic Problems in Composite Materials and Porous Media , 1997 .
[52] Michal Mrozowski,et al. A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU , 2011 .
[53] Robert H. Dennard,et al. Design of ion-implanted MOSFET's with very small physical dimensions , 2007 .
[54] P. Brouaye,et al. A mesh coloring method for efficient MIMD processing in finite element problems , 1982, ICPP.
[55] David E. Keyes,et al. Towards Realistic Performance Bounds for Implicit CFD Codes , 2000 .
[56] Daniel J. Arrigo,et al. An Introduction to Partial Differential Equations , 2017, An Introduction to Partial Differential Equations.
[57] 장훈,et al. [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .
[58] Martin Kronbichler,et al. WorkStream -- A Design Pattern for Multicore-Enabled Finite Element Computations , 2016, ACM Trans. Math. Softw..
[59] Timothy C. Warburton,et al. Nodal discontinuous Galerkin methods on graphics processors , 2009, J. Comput. Phys..
[60] D. Keyes,et al. Jacobian-free Newton-Krylov methods: a survey of approaches and applications , 2004 .