Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms
暂无分享,去创建一个
Markus Hadwiger | David E. Keyes | Ravi Samtaney | Amani AlOnazi | D. Keyes | M. Hadwiger | R. Samtaney | Amani Alonazi
[1] Alexey Lastovetsky,et al. Towards a Realistic Performance Model for Networks of Heterogeneous Computers , 2005 .
[2] F. Harlow,et al. Numerical Calculation of Time‐Dependent Viscous Incompressible Flow of Fluid with Free Surface , 1965 .
[3] Emil M. Constantinescu,et al. Multiphysics simulations , 2013, HiPC 2013.
[4] Alexey L. Lastovetsky,et al. High Performance Heterogeneous Computing , 2009, Wiley series on parallel and distributed computing.
[5] Rajesh Bordawekar,et al. Optimizing Sparse Matrix-Vector Multiplication on GPUs using Compile-time and Run-time Strategies , 2008 .
[6] Alexey L. Lastovetsky,et al. HeteroMPI+ScaLAPACK: Towards a ScaLAPACK (Dense Linear Solvers) on Heterogeneous Networks of Computers , 2006, HiPC.
[7] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, PARA.
[8] Anthony Skjellum,et al. Portable Parallel Programming with the Message-Passing Interface , 1996 .
[9] Leonel Sousa,et al. Hierarchical Partitioning Algorithm for Scientific Computing on Highly Heterogeneous CPU + GPU Clusters , 2012, Euro-Par.
[10] Hrvoje Jasak,et al. Error analysis and estimation for the finite volume method with applications to fluid flows , 1996 .
[11] Jens Jägersküpper,et al. A Novel Shared-Memory Thread-Pool Implementation for Hybrid Parallel CFD Solvers , 2011, Euro-Par.
[12] Wolfgang Straßer,et al. A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[13] James R. Stewart,et al. A framework approach for developing parallel adaptive multiphysics applications , 2004 .
[14] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[15] P. Schröder,et al. Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.
[16] Brett A. Becker,et al. Partitioning for Parallel Matrix-Matrix Multiplication with Heterogeneous Processors: The Optimal Solution , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[17] Yuan Liu. Hybrid Parallel Computation of OpenFOAM Solver on Multi-Core Cluster Systems , 2011 .
[18] Alexey L. Lastovetsky. Heterogeneity in parallel and distributed computing , 2013, J. Parallel Distributed Comput..
[19] Alexey L. Lastovetsky,et al. Data partitioning for multiprocessors with memory heterogeneity and memory constraints , 2005, Sci. Program..
[20] S. Sitharama Iyengar,et al. Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.
[21] Alexey L. Lastovetsky,et al. Data distribution for dense factorization on computers with memory heterogeneity , 2007, Parallel Comput..
[22] Farshad Khunjush,et al. Optimization of OpenFOAM's linear solvers on emerging multi-core platforms , 2011, Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.
[23] J. Shewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .
[24] Alexey L. Lastovetsky,et al. Data Partitioning with a Functional Performance Model of Heterogeneous Processors , 2007, Int. J. High Perform. Comput. Appl..
[25] Ziming Zhong,et al. Data Partitioning on Heterogeneous Multicore Platforms , 2011, 2011 IEEE International Conference on Cluster Computing.
[26] Robert M. Farber,et al. CUDA Application Design and Development , 2011 .
[27] Liu You,et al. Real-Time 3D Fluid Simulation on GPU with Complex Obstacles , 2006 .
[28] Emmanuel Agullo,et al. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[29] Guillaume Caumon,et al. Concurrent number cruncher: a GPU implementation of a general sparse linear solver , 2009, Int. J. Parallel Emergent Distributed Syst..
[30] Satoshi Matsuoka,et al. High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning , 2010, Computer Science - Research and Development.
[31] Jack J. Dongarra,et al. A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators , 2010, VECPAR.
[32] Alexey L. Lastovetsky,et al. Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers , 2001, J. Parallel Distributed Comput..
[33] Alexey L. Lastovetsky,et al. Dynamic Load Balancing of Parallel Computational Iterative Routines on Highly Heterogeneous HPC Platforms , 2011, Parallel Process. Lett..
[34] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[35] Wim Vanroose,et al. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..
[36] Christina Freytag,et al. Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .
[37] William Gropp,et al. High-performance parallel implicit CFD , 2001, Parallel Comput..
[38] Jack Dongarra,et al. Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators , 2012 .
[39] Katarzyna Zadarnowska,et al. Complete PISO and SIMPLE solvers on Graphics Processing Units , 2012, ArXiv.
[40] G. R. Mudalige,et al. OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures , 2012, 2012 Innovative Parallel Computing (InPar).
[41] Alexey Lastovetsky. Parallel Simulation of Oil Extraction on Heterogeneous Networks of Computers , 2012 .
[42] Alexey L. Lastovetsky,et al. Data partitioning with a realistic performance model of networks of heterogeneous computers with task size limits , 2004, Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks.
[43] Sophie Papst,et al. Computational Methods For Fluid Dynamics , 2016 .
[44] H. T. Kung,et al. Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors , 2012 .
[45] Hrvoje Jasak,et al. A tensorial approach to computational continuum mechanics using object-oriented techniques , 1998 .
[46] Kevin Skadron,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..
[47] Toni Cortes,et al. PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .
[48] D. Birchall,et al. Computational Fluid Dynamics , 2020, Radial Flow Turbocompressors.
[49] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[50] Alexey L. Lastovetsky,et al. Heterogeneous Distribution of Computations While Solving Linear Algebra Problems on Networks of Heterogeneous Computers , 1999, HPCN Europe.
[51] Alexey L. Lastovetsky,et al. Dynamic Load Balancing of Parallel Computational Iterative Routines on Platforms with Memory Heterogeneity , 2010, Euro-Par Workshops.
[52] David Skinner,et al. Capturing and Visualizing Event Flow Graphs of MPI Applications , 2009, Euro-Par Workshops.
[53] Chao-Tung Yang,et al. Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters , 2011, Comput. Phys. Commun..
[54] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[55] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[56] Emmanuel Jeannot,et al. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models , 2014, HiPC 2014.
[57] Ian Buck,et al. GPU computing with NVIDIA CUDA , 2007, SIGGRAPH Courses.
[58] Alexey L. Lastovetsky,et al. Design and implementation of self-adaptable parallel algorithms for scientific computing on highly heterogeneous HPC platforms , 2011, ArXiv.
[59] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[60] Michael Garland,et al. Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .
[61] Alexey L. Lastovetsky,et al. Two-Dimensional Matrix Partitioning for Parallel Computing on Heterogeneous Processors Based on Their Functional Performance Models , 2009, Euro-Par Workshops.
[62] Naga K. Govindaraju,et al. GPGPU: general-purpose computation on graphics hardware , 2006, SC.
[63] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.
[64] Alexey L. Lastovetsky,et al. Using Multidimensional Solvers for Optimal Data Partitioning on Dedicated Heterogeneous HPC Platforms , 2011, PaCT.
[65] Hrvoje Jasak,et al. Development of a Generalized Grid Mesh Interface for Turbomachinery simulations with OpenFOAM , 2008 .
[66] Constantine D. Polychronopoulos,et al. Parallel programming and compilers , 1988 .
[67] François Pellegrini,et al. PT-Scotch: A tool for efficient parallel graph ordering , 2008, Parallel Comput..
[68] Ziming Zhong,et al. Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications , 2012, 2012 IEEE International Conference on Cluster Computing.
[69] Alexey L. Lastovetsky,et al. Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors Based on Functional Performance Models , 2011, Euro-Par Workshops.
[70] Paride Dagnaa,et al. Partnership for Advanced Computing in Europe Evaluation of Multi-threaded OpenFOAM Hybridization for Massively Parallel Architectures , 2013 .
[71] William Gropp,et al. Domain decomposition on parallel computers , 1989, IMPACT Comput. Sci. Eng..
[72] Alexey L. Lastovetsky,et al. HeteroMPI: Towards a message-passing library for heterogeneous networks of computers , 2006, J. Parallel Distributed Comput..
[73] Hugh Garraway. Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.
[74] Alexey L. Lastovetsky,et al. Building the functional performance model of a processor , 2006, SAC.
[75] Ziming Zhong,et al. FuPerMod: A Framework for Optimal Data Partitioning for Parallel Scientific Applications on Dedicated Heterogeneous HPC Platforms , 2013, PaCT.
[76] Manolis Papadrakakis,et al. A new era in scientific computing: Domain decomposition methods in hybrid CPU-GPU architectures , 2011 .
[77] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[78] Alexey L. Lastovetsky,et al. Distributed Data Partitioning for Heterogeneous Processors Based on Partial Estimation of Their Functional Performance Models , 2009, Euro-Par Workshops.
[79] Cédric Augonnet,et al. StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines , 2010 .