Composition of Algorithmic Building Blocks in Template Task Graphs

In this paper, we explore the composition capabilities of the Template Task Graph (TTG) programming model. We show how fine-grain composition of tasks is possible in TTG between DAGs belonging to different libraries, even in a distributed setup. We illustrate the benefits of this fine-grain composition on a linear algebra operation, the matrix inversion via the Cholesky method, which consists of three operations that need to be applied in sequence.Evaluation on a cluster of many core shows that the transparent fine-grain composition implements the complex operation without introducing unnecessary synchronizations, increasing the overlap of communication and computation, and thus improving significantly the performance of the entire composed operation.

[1]  Edward F. Valeev,et al.  Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment , 2022, IEEE International Parallel and Distributed Processing Symposium.

[2]  Edward F. Valeev,et al.  The Template Task Graph (TTG) - an emerging practical dataflow programming paradigm for scientific simulation at extreme scale , 2020, 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2).

[3]  Andreas Beckmann,et al.  Eventify: Event-Based Task Parallelism for Strong Scaling , 2020, PASC.

[4]  J. Dongarra,et al.  SLATE: design of a modern distributed and accelerated linear algebra library , 2019, SC.

[5]  José Gracia,et al.  Global Task Data-Dependencies in PGAS Applications , 2019, ISC.

[6]  Emmanuel Agullo,et al.  Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model , 2017 .

[7]  Thomas Hérault,et al.  Dynamic task discovery in PaRSEC: a data-flow task-based runtime , 2017, ScalA@SC.

[8]  Robert J. Harrison,et al.  MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation , 2015, SIAM J. Sci. Comput..

[9]  Thomas Hérault,et al.  PTG: An Abstraction for Unhindered Parallelism , 2014, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.

[10]  George Bosilca,et al.  PaRSEC : A programming paradigm exploiting heterogeneity for enhancing scalability , 2013 .

[11]  Thomas Heller,et al.  Application of the ParalleX execution model to stencil-based problems , 2012, Computer Science - Research and Development.

[12]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Thomas Hérault,et al.  Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[14]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[15]  J. Paul Morrison,et al.  Flow-Based Programming, 2nd Edition: A New Approach to Application Development , 2010 .

[16]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[17]  Alejandro Duran,et al.  A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks , 2009, International Journal of Parallel Programming.

[18]  Julien Langou,et al.  A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[19]  Laxmikant V. Kalé,et al.  Structured Dagger: A Coordination Language for Message-Driven Programming , 1996, Euro-Par, Vol. I.

[20]  Laxmikant V. Kalé,et al.  Dagger: combining benefits of synchronous and asynchronous communication styles , 1994, Proceedings of 8th International Parallel Processing Symposium.

[21]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[22]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[23]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[24]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).