Evaluating Data Redistribution in PaRSEC
暂无分享,去创建一个
George Bosilca | Nuria Losada | Qinglei Cao | Dong Zhong | Wei Wu | Jack Dongarra | J. Dongarra | G. Bosilca | Wei Wu | Qinglei Cao | Dong Zhong | Nuria Losada
[1] Michael Metcalf,et al. High performance Fortran , 1995 .
[2] David E. Keyes,et al. Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems , 2020, IEEE International Parallel and Distributed Processing Symposium.
[3] Nathan T. Hjelm,et al. Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs , 2019, 2019 IEEE International Conference on Cluster Computing (CLUSTER).
[4] Andrew James Mayfield,et al. Adaptive mesh refinement , 1993 .
[5] Yu Pei,et al. Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools , 2019, 2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools).
[6] Wei Wu,et al. Flexible Data Redistribution in a Task-Based Runtime System , 2020, 2020 IEEE International Conference on Cluster Computing (CLUSTER).
[7] Gudula Rünger,et al. Flexible all‐to‐all data redistribution methods for grid‐based particle codes , 2018, Concurr. Comput. Pract. Exp..
[8] Gudula Rünger,et al. Fine-Grained Data Distribution Operations for Particle Codes , 2009, PVM/MPI.
[9] Viktor K. Prasanna,et al. Efficient Algorithms for Block-Cyclic Redistribution of Arrays , 1999, Algorithmica.
[10] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] George Bosilca,et al. PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution , 2015, 2015 IEEE International Conference on Cluster Computing.
[12] Siegfried Benkner,et al. Implementing the Open Community Runtime for Shared-Memory and Distributed-Memory Systems , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).
[13] Thomas Hérault,et al. PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.
[14] Asim YarKhan,et al. Dynamic Task Execution on Shared and Distributed Memory Architectures , 2012 .
[15] Bernard Tourancheau,et al. Efficient Block Cyclic Data Redistribution , 1996, Euro-Par, Vol. I.
[16] Elisabeth Larsson,et al. A task parallel implementation of a scattered node stencil-based solver for the shallow water equations , 2013 .
[17] Jack J. Dongarra,et al. Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs , 2016, IEEE Transactions on Parallel and Distributed Systems.
[18] William Gropp,et al. DAME: A Runtime-Compiled Engine for Derived Datatypes , 2015, EuroMPI.
[19] Thomas Hérault,et al. Dynamic task discovery in PaRSEC: a data-flow task-based runtime , 2017, ScalA@SC.
[20] Jack Dongarra,et al. Array Redistribution in ScaLAPACK Using PVM , 1995 .
[21] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[22] Susan Coghlan,et al. Operating system issues for petascale systems , 2006, OPSR.
[23] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[24] Thomas Hérault,et al. PTG: An Abstraction for Unhindered Parallelism , 2014, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.
[25] Sergio Iserte,et al. DMRlib: Easy-coding and efficient resource management for job malleability , 2020 .
[26] Gudula Rünger,et al. Efficient Data Redistribution Methods for Coupled Parallel Particle Codes , 2013, 2013 42nd International Conference on Parallel Processing.
[27] Clément Foyer,et al. ASPEN: An Efficient Algorithm for Data Redistribution Between Producer and Consumer Grids , 2018, Euro-Par Workshops.
[28] Robert J. Harrison,et al. Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure , 2020, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[29] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[30] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .
[31] Michael Wolfe,et al. Optimization of Array Redistribution for Distributed Memory Multicomputers , 1995, Parallel Comput..
[32] Jack Dongarra,et al. Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC , 2019, 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA).
[33] Pradipta De,et al. Impact of Noise on Scaling of Collectives: An Empirical Evaluation , 2006, HiPC.
[34] Domenico Talia,et al. ServiceSs: An Interoperable Programming Framework for the Cloud , 2013, Journal of Grid Computing.
[35] Wei Wu,et al. Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance , 2019, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[37] Minyi Guo,et al. A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers , 2001, The Journal of Supercomputing.
[38] J. David Moulton,et al. Scaling Structured Multigrid to 500K+ Cores through Coarse-Grid Redistribution , 2018, SIAM J. Sci. Comput..
[39] Philippe Olivier Alexandre Navaux,et al. Performance Improvement of Stencil Computations for Multi-core Architectures based on Machine Learning , 2017, ICCS.
[40] Ching-Hsien Hsu,et al. A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution , 2000, IEEE Trans. Parallel Distributed Syst..
[41] J. Ramanujam,et al. Multi-phase array redistribution: modeling and evaluation , 1995, Proceedings of 9th International Parallel Processing Symposium.
[42] Jack J. Dongarra,et al. Algorithmic Redistribution Methods for Block-Cyclic Decompositions , 1999, IEEE Trans. Parallel Distributed Syst..
[43] Viktor K. Prasanna,et al. High-performance computing for vision , 1996, Proc. IEEE.
[44] Rajeev Thakur,et al. Efficient Algorithms for Array Redistribution , 1996, IEEE Trans. Parallel Distributed Syst..
[45] Thomas Hérault,et al. Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[46] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[47] Armin R. Mikler,et al. Net-PIPE: Network Protocol Independent Performance Evaluator , 1997 .
[48] George Bosilca,et al. Accelerating NWChem Coupled Cluster through dataflow-based execution , 2015, PPAM.
[49] George Bosilca,et al. PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability , 2013 .
[50] Thomas Heller,et al. Application of the ParalleX execution model to stencil-based problems , 2013, Computer Science - Research and Development.
[51] Samuel Thibault,et al. MASA-StarPU: Parallel Sequence Comparison with Multiple Scheduling Policies and Pruning , 2020, 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[52] Michael Wolfe,et al. A New Approach to Array Redistribution: Strip Mining Redistribution , 1994, PARLE.
[53] Torsten Hoefler,et al. MPI datatype processing using runtime compilation , 2013, EuroMPI.
[54] Alejandro Duran,et al. A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks , 2009, International Journal of Parallel Programming.
[55] Jaeyoung Choi,et al. Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers , 1995, Parallel Comput..
[56] David W. Walker,et al. Redistribution of block-cyclic data distributions using MPI , 1996, Concurr. Pract. Exp..
[57] Rajesh Sudarsan,et al. Efficient Multidimensional Data Redistribution for Resizable Parallel Computations , 2007, ISPA.
[58] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[59] Francois Tessier,et al. Automated Dynamic Data Redistribution , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[60] Thomas Hérault,et al. Assessing the cost of redistribution followed by a computational kernel: Complexity and performance results , 2016, Parallel Comput..
[61] Robert A. van de Geijn,et al. Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures , 2007, SPAA '07.
[62] Emmanuel Agullo,et al. Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model , 2017 .
[63] George Bosilca,et al. Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[64] Dan Tsafrir,et al. System noise, OS clock ticks, and fine-grained parallel applications , 2005, ICS '05.