Flexible Data Redistribution in a Task-Based Runtime System
暂无分享,去创建一个
Wei Wu | George Bosilca | Jack Dongarra | Qinglei Cao | Dong Zhong | Aurelien Bouteiller | J. Dongarra | Aurélien Bouteiller | G. Bosilca | Wei Wu | Qinglei Cao | Dong Zhong | A. Bouteiller
[1] Gudula Rünger,et al. Fine-Grained Data Distribution Operations for Particle Codes , 2009, PVM/MPI.
[2] Minyi Guo,et al. A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers , 2001, The Journal of Supercomputing.
[3] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[4] Michael Metcalf,et al. High performance Fortran , 1995 .
[5] Thomas Hérault,et al. PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.
[6] Clément Foyer,et al. ASPEN: An Efficient Algorithm for Data Redistribution Between Producer and Consumer Grids , 2018, Euro-Par Workshops.
[7] Thomas Hérault,et al. Dynamic task discovery in PaRSEC: a data-flow task-based runtime , 2017, ScalA@SC.
[8] Hatem Ltaief,et al. Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications , 2020, PASC.
[9] David W. Walker,et al. Redistribution of block-cyclic data distributions using MPI , 1996, Concurr. Pract. Exp..
[10] Gudula Rünger,et al. Flexible all‐to‐all data redistribution methods for grid‐based particle codes , 2018, Concurr. Comput. Pract. Exp..
[11] Jack J. Dongarra,et al. Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs , 2016, IEEE Transactions on Parallel and Distributed Systems.
[12] Wei Wu,et al. Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance , 2019, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Francois Tessier,et al. Automated Dynamic Data Redistribution , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[14] Michael Wolfe,et al. Optimization of Array Redistribution for Distributed Memory Multicomputers , 1995, Parallel Comput..
[15] Jack J. Dongarra,et al. Algorithmic Redistribution Methods for Block-Cyclic Decompositions , 1999, IEEE Trans. Parallel Distributed Syst..
[16] Gudula Rünger,et al. Efficient Data Redistribution Methods for Coupled Parallel Particle Codes , 2013, 2013 42nd International Conference on Parallel Processing.
[17] Rajeev Thakur,et al. Efficient Algorithms for Array Redistribution , 1996, IEEE Trans. Parallel Distributed Syst..
[18] Bernard Tourancheau,et al. Efficient Block Cyclic Data Redistribution , 1996, Euro-Par, Vol. I.
[19] Rajesh Sudarsan,et al. Efficient Multidimensional Data Redistribution for Resizable Parallel Computations , 2007, ISPA.
[20] J. David Moulton,et al. Scaling Structured Multigrid to 500K+ Cores through Coarse-Grid Redistribution , 2018, SIAM J. Sci. Comput..
[21] Thomas Hérault,et al. Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[22] Jack Dongarra,et al. Array Redistribution in ScaLAPACK Using PVM , 1995 .
[23] Michael Wolfe,et al. A New Approach to Array Redistribution: Strip Mining Redistribution , 1994, PARLE.
[24] Thomas Hérault,et al. PTG: An Abstraction for Unhindered Parallelism , 2014, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.
[25] George Bosilca,et al. PaRSEC : A programming paradigm exploiting heterogeneity for enhancing scalability , 2013 .
[26] Yu Pei,et al. Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools , 2019, 2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools).
[27] Ching-Hsien Hsu,et al. A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution , 2000, IEEE Trans. Parallel Distributed Syst..
[28] Thomas Hérault,et al. Assessing the cost of redistribution followed by a computational kernel: Complexity and performance results , 2016, Parallel Comput..
[29] Viktor K. Prasanna,et al. Efficient Algorithms for Block-Cyclic Redistribution of Arrays , 1999, Algorithmica.