Task‐based parallel strategies for computational fluid dynamic application in heterogeneous CPU/GPU resources
暂无分享,去创建一个
Lucas Mello Schnorr | Philippe Olivier Alexandre Navaux | Lucas Leandro Nesi | Matheus da Silva Serpa | L. Schnorr | P. Navaux | M. Serpa
[1] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[2] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[3] Jack Dongarra,et al. The TOP500: History, Trends, and Future Directions in High Performance Computing , 2020 .
[4] Emmanuel Jeannot,et al. Experimenting task-based runtimes on a legacy Computational Fluid Dynamics code with unstructured meshes , 2018, Computers & Fluids.
[5] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[6] Jean Roman,et al. Design and Analysis of a Task-based Parallelization over a Runtime System of an Explicit Finite-Volume CFD Code with Adaptive Time Stepping , 2017, J. Comput. Sci..
[7] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.
[8] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[9] Emmanuel Agullo,et al. Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems , 2016, ACM Trans. Math. Softw..
[10] Christina Freytag,et al. Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .
[11] C. Xie. Interactive Heat Transfer Simulations for Everyone , 2012 .
[12] Arch D. Robison,et al. Intel® Threading Building Blocks (TBB) , 2011, Encyclopedia of Parallel Computing.
[13] Jairo Panetta,et al. Evaluating optimizations that reduce global memory accesses of stencil computations in GPGPUs , 2019, Concurr. Comput. Pract. Exp..
[14] Asif Afzal,et al. Parallelization Strategies for Computational Fluid Dynamics Software: State of the Art Review , 2016, Archives of Computational Methods in Engineering.
[15] Albert Farrés,et al. Optimization strategies for geophysics models on manycore systems , 2019, Int. J. High Perform. Comput. Appl..
[16] Lucas Mello Schnorr,et al. Design, Implementation and Performance Analysis of a CFD Task-Based Application for Heterogeneous CPU/GPU Resources , 2018, VECPAR.
[17] OlukotunKunle,et al. A domain-specific approach to heterogeneous parallelism , 2011 .
[18] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[19] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[20] Alfredo Buttari,et al. Fine Granularity Sparse QR Factorization for Multicore Based Systems , 2010, PARA.
[21] Qiqi Wang,et al. The swept rule for breaking the latency barrier in time advancing two-dimensional PDEs , 2016, ArXiv.
[22] AgulloEmmanuel,et al. Task-based FMM for heterogeneous architectures , 2016 .
[23] M. Snir,et al. Ghost Cell Pattern , 2010, ParaPLoP '10.
[24] Jack Dongarra,et al. Faster, Cheaper, Better { a Hybridization Methodology to Develop Linear Algebra Software for GPUs , 2010 .
[25] Kunle Olukotun,et al. A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.
[26] Samuel Thibault,et al. On Runtime Systems for Task-based Programming on Heterogeneous Platforms , 2018 .
[27] Philippe Thierry,et al. Characterization and Optimization Methodology Applied to Stencil Computations , 2015 .
[28] Inanc Senocak,et al. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters , 2010 .
[29] Bruno Raffin,et al. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[30] Qiqi Wang,et al. The swept rule for breaking the latency barrier in time advancing PDEs , 2015, J. Comput. Phys..
[31] Lucas Mello Schnorr,et al. Visual Performance Analysis of Memory Behavior in a Task-Based Runtime on Hybrid Platforms , 2019, 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[32] Lucas Mello Schnorr,et al. A visual performance analysis framework for task‐based parallel applications running on hybrid clusters , 2018, Concurr. Comput. Pract. Exp..
[33] Emmanuel Agullo,et al. Task‐based FMM for heterogeneous architectures , 2016, Concurr. Comput. Pract. Exp..
[34] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.