Abstraction Layer For Standardizing APIs of Task-Based Engines
暂无分享,去创建一个
Hatem Ltaief | David Keyes | Rabab Alomairy | Mustafa Abduljabbar | D. Keyes | H. Ltaief | M. Abduljabbar | Rabab Alomairy
[1] Samuel Thibault,et al. Evaluation of OpenMP Dependent Tasks with the KASTORS Benchmark Suite , 2014, IWOMP.
[2] Jack J. Dongarra,et al. Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures , 2011, PARCO.
[3] David E. Keyes,et al. Exploiting Data Sparsity for Large-Scale Matrix Computations , 2018, Euro-Par.
[4] Emmanuel Agullo,et al. Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).
[5] Thomas Hérault,et al. PTG: An Abstraction for Unhindered Parallelism , 2014, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.
[6] George Bosilca,et al. Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[7] Bruno Lang. Efficient eigenvalue and singular value computations on shared memory machines , 1999, Parallel Comput..
[8] Emmanuel Agullo,et al. Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model , 2017 .
[9] Siegfried Benkner,et al. Implementing the Open Community Runtime for Shared-Memory and Distributed-Memory Systems , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).
[10] Olga Pearce,et al. RAJA: Portable Performance for Large-Scale Scientific Applications , 2019, 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC).
[11] Daniel Sunderland,et al. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..
[12] Jack J. Dongarra,et al. A novel hybrid CPU–GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks , 2014, Int. J. High Perform. Comput. Appl..
[13] Alejandro Duran,et al. A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks , 2009, International Journal of Parallel Programming.
[14] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[15] Thomas Heller,et al. Application of the ParalleX execution model to stencil-based problems , 2012, Computer Science - Research and Development.
[16] Eric Gendron,et al. Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs , 2016, PASC.
[17] Jack J. Dongarra,et al. Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[18] Christian H. Bischof,et al. Algorithm 807: The SBR Toolbox—software for successive band reduction , 2000, TOMS.
[19] Kostas Katrinis,et al. A taxonomy of task-based parallel programming technologies for high-performance computing , 2018, The Journal of Supercomputing.
[20] Courtenay T. Vaughan,et al. ASC Tri-lab Co-design Level 2 Milestone Report 2015 , 2015 .
[21] Qingyu Meng,et al. Investigating applications portability with the uintah DAG-based runtime system on petascale supercomputers , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[22] Philipp Birken,et al. Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.
[23] Ronald Kriemann,et al. H-LU Factorization on Many-Core Systems , 2014 .
[24] Christina Freytag,et al. Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .
[25] A. Stathopoulos,et al. Solution of large eigenvalue problems in electronic structure calculations , 1996 .
[26] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] Alejandro Duran,et al. Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[28] Jack J. Dongarra,et al. Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[29] Yousef Saad,et al. PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[30] Thomas Hérault,et al. Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[31] Richard J. Simard,et al. Computing the Two-Sided Kolmogorov-Smirnov Distribution , 2011 .
[32] Jesús Labarta,et al. Dense Matrix Computations on NUMA Architectures with Distance-Aware Work Stealing , 2015, Supercomput. Front. Innov..
[33] Lukas Krämer,et al. Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations , 2011, Parallel Comput..
[34] George Bosilca,et al. Accelerating NWChem Coupled Cluster through dataflow-based execution , 2015, PPAM.
[35] David E. Keyes,et al. Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions , 2017, ISC.
[36] Jack Dongarra,et al. Designing SLATE: Software for Linear Algebra Targeting Exascale , 2017 .
[37] Raúl Sánchez,et al. Event-based parareal: A data-flow based implementation of parareal , 2012, J. Comput. Phys..
[38] A Marek,et al. The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science , 2014, Journal of physics. Condensed matter : an Institute of Physics journal.
[39] Asim YarKhan,et al. Dynamic Task Execution on Shared and Distributed Memory Architectures , 2012 .
[40] Martin Berzins,et al. ASC ATDM Level 2 Milestone #5325: Asynchronous Many-Task Runtime System Analysis and Assessment for Next Generation Platforms , 2015 .
[41] Gene H. Golub,et al. Matrix computations (3rd ed.) , 1996 .
[42] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[43] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[44] W. Hackbusch,et al. Hierarchical Matrices: Algorithms and Analysis , 2015 .
[45] Laxmikant V. Kalé,et al. Runtime Coordinated Heterogeneous Tasks in Charm++ , 2016, 2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2).
[46] David E. Keyes,et al. Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture , 2017, Euro-Par.
[47] Jack J. Dongarra,et al. Porting the PLASMA Numerical Library to the OpenMP Standard , 2017, International Journal of Parallel Programming.