An Approach for Indirectly Adopting a Performance Portability Layer in Large Legacy Codes
暂无分享,去创建一个
[1] Jennifer Spinti,et al. Large eddy simulations of accidental fires using massively parallel computers , 2003 .
[2] Timothy G. Mattson,et al. Evaluating data parallelism in C++ using the Parallel Research Kernels , 2019, IWOCL.
[3] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[4] Matt Martineau,et al. Assessing the performance portability of modern parallel programming models using TeaLeaf , 2017, Concurr. Comput. Pract. Exp..
[5] Daniel J. Rader,et al. Direct simulation Monte Carlo: The quest for speed , 2014 .
[6] Daniel Sunderland,et al. Portably Improving Uintah ’ s Readiness for Exascale Systems Through the Use of Kokkos , .
[7] Martin Berzins,et al. Radiative Heat Transfer Calculation on 16384 GPUs Using a Reverse Monte Carlo Ray Tracing Approach with Adaptive Mesh Refinement , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[8] Martin Berzins,et al. Improving Uintah's Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks , 2017, PEARC.
[9] Timothy C. Warburton,et al. OCCA: A unified approach to multi-threading languages , 2014, ArXiv.
[10] Philipp Grete,et al. K-Athena: A Performance Portable Structured Grid Finite Volume Magnetohydrodynamics Code , 2019, IEEE Transactions on Parallel and Distributed Systems.
[11] Martin Berzins,et al. A Preliminary Port and Evaluation of the Uintah AMT Runtime on Sunway TaihuLight , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[12] Jeff R. Hammond,et al. A comparative analysis of Kokkos and SYCL as heterogeneous, parallel programming models for C++ applications , 2019, IWOCL.
[13] Tamara G. Kolda,et al. An overview of the Trilinos project , 2005, TOMS.
[14] Qingyu Meng,et al. Investigating applications portability with the uintah DAG-based runtime system on petascale supercomputers , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[15] Matt Martineau,et al. Evaluating attainable memory bandwidth of parallel programming models via BabelStream , 2018, Int. J. Comput. Sci. Eng..
[16] Roger P. Pawlowski,et al. Toward performance portability of the Albany finite element analysis code using the Kokkos library , 2018, Int. J. High Perform. Comput. Appl..
[17] Steve Plimpton,et al. Fast parallel algorithms for short-range molecular dynamics , 1993 .
[18] Richard D. Hornung,et al. The RAJA Portability Layer: Overview and Status , 2014 .
[19] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Tamara G. Kolda,et al. Software for Sparse Tensor Decomposition on Emerging Computing Architectures , 2018, SIAM J. Sci. Comput..
[21] Martin Berzins,et al. An Overview of Performance Portability in the Uintah Runtime System through the Use of Kokkos , 2016, 2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2).
[22] Thomas Hérault,et al. PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.
[23] David Moxey,et al. Accelerating high-order mesh optimisation with an architecture-independent programming model , 2018, Comput. Phys. Commun..
[24] Martin Berzins,et al. Demonstrating GPU code portability and scalability for radiative heat transfer computations , 2018, J. Comput. Sci..
[25] Martin Berzins,et al. ASC ATDM Level 2 Milestone #5325: Asynchronous Many-Task Runtime System Analysis and Assessment for Next Generation Platforms , 2015 .
[26] Daniel Sunderland,et al. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..
[27] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[28] John Shalf,et al. BoxLib with Tiling: An AMR Software Framework , 2016, ArXiv.
[29] Marcus S. Day,et al. AMReX: a framework for block-structured adaptive mesh refinement , 2019, J. Open Source Softw..
[30] Bok Jik Lee,et al. Direct numerical simulations of reacting flows with detailed chemistry using many-core/GPU acceleration , 2018, Computers & Fluids.
[31] Andrew M. Bradley,et al. HOMMEXX 1.0: a performance-portable atmospheric dynamical core for the Energy Exascale Earth System Model , 2019, Geoscientific Model Development.
[32] Martin Berzins,et al. A Scalable Algorithm for Radiative Heat Transfer Using Reverse Monte Carlo Ray Tracing , 2015, ISC.
[33] Martin Schulz,et al. ARCHER: Effectively Spotting Data Races in Large OpenMP Applications , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[34] Qingyu Meng,et al. Extending the Uintah Framework through the Petascale Modeling of Detonation in Arrays of High Explosive Devices , 2016, SIAM J. Sci. Comput..
[35] John Shalf,et al. The Cactus Framework and Toolkit: Design and Applications , 2002, VECPAR.
[36] Martin Berzins,et al. Developing Uintah ’ s Runtime System For Forthcoming Architectures , 2015 .
[37] Brian van Straalen,et al. A survey of high level frameworks in block-structured adaptive mesh refinement packages , 2014, J. Parallel Distributed Comput..
[38] Konstantin Serebryany,et al. ThreadSanitizer: data race detection in practice , 2009, WBIA '09.
[39] Hartmut Kaiser,et al. HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.