论文信息 - Composition of Algorithmic Building Blocks in Template Task Graphs

Composition of Algorithmic Building Blocks in Template Task Graphs

In this paper, we explore the composition capabilities of the Template Task Graph (TTG) programming model. We show how fine-grain composition of tasks is possible in TTG between DAGs belonging to different libraries, even in a distributed setup. We illustrate the benefits of this fine-grain composition on a linear algebra operation, the matrix inversion via the Cholesky method, which consists of three operations that need to be applied in sequence.Evaluation on a cluster of many core shows that the transparent fine-grain composition implements the complex operation without introducing unnecessary synchronizations, increasing the overlap of communication and computation, and thus improving significantly the performance of the entire composed operation.

Edward F. Valeev | G. Bosilca | T. Hérault | Joseph Schuchart

[1] Edward F. Valeev,et al. Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment , 2022, IEEE International Parallel and Distributed Processing Symposium.

[2] Edward F. Valeev,et al. The Template Task Graph (TTG) - an emerging practical dataflow programming paradigm for scientific simulation at extreme scale , 2020, 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2).

[3] Andreas Beckmann,et al. Eventify: Event-Based Task Parallelism for Strong Scaling , 2020, PASC.

[4] J. Dongarra,et al. SLATE: design of a modern distributed and accelerated linear algebra library , 2019, SC.

[5] José Gracia,et al. Global Task Data-Dependencies in PGAS Applications , 2019, ISC.

[6] Emmanuel Agullo,et al. Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model , 2017 .

[7] Thomas Hérault,et al. Dynamic task discovery in PaRSEC: a data-flow task-based runtime , 2017, ScalA@SC.

[8] Robert J. Harrison,et al. MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation , 2015, SIAM J. Sci. Comput..

[9] Thomas Hérault,et al. PTG: An Abstraction for Unhindered Parallelism , 2014, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.

[10] George Bosilca,et al. PaRSEC : A programming paradigm exploiting heterogeneity for enhancing scalability , 2013 .

[11] Thomas Heller,et al. Application of the ParalleX execution model to stencil-based problems , 2012, Computer Science - Research and Development.

[12] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[13] Thomas Hérault,et al. Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[14] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[15] J. Paul Morrison,et al. Flow-Based Programming, 2nd Edition: A New Approach to Application Development , 2010 .

[16] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[17] Alejandro Duran,et al. A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks , 2009, International Journal of Parallel Programming.

[18] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[19] Laxmikant V. Kalé,et al. Structured Dagger: A Coordination Language for Message-Driven Programming , 1996, Euro-Par, Vol. I.

[20] Laxmikant V. Kalé,et al. Dagger: combining benefits of synchronous and asynchronous communication styles , 1994, Proceedings of 8th International Parallel Processing Symposium.

[21] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[22] Jack Dongarra,et al. ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[23] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[24] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).