Leveraging Task-Parallelism with OmpSs in ILUPACK's Preconditioned CG Method

In this paper we describe how to efficiently exploit task parallelism for the solution of sparse linear systems on multithreaded processors via ILUPACK's multi-level preconditioned CG method. Using a pair of data structures, we capture the task dependencies that appear in the two most challenging operations in the method (calculation of the preconditioned and its application), passing this information to the OmpSs runtime which can then implement a correct and efficient schedule of the entire solver. Our results with high-end multicore platforms equipped with Intel and AMD processors report significant performance gains, demonstrating that OmpSs provides an efficient and close-to seamless means to leverage the concurrency in a complex scientific code like ILUPACK.

[1]  Thomas Hérault,et al.  DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[2]  Justin Luitjens,et al.  Uintah: a scalable framework for hazard analysis , 2010, TG.

[3]  Gregory Diamos,et al.  Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.

[4]  Yousef Saad,et al.  Multilevel Preconditioners Constructed From Inverse-Based ILUs , 2005, SIAM J. Sci. Comput..

[5]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Jesús Labarta,et al.  Symmetric Rank-k Update on Clusters of Multicore Processors with SMPSs , 2011, PARCO.

[7]  Jennifer A. Scott,et al.  Design of a Multicore Sparse Cholesky Factorization Using DAGs , 2010, SIAM J. Sci. Comput..

[8]  George Bosilca,et al.  Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[9]  Bruno Raffin,et al.  XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[10]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[11]  Andrew S. Grimshaw,et al.  Easy-to-use object-oriented parallel processing with Mentat , 1993, Computer.

[12]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[13]  Victor Eijkhout,et al.  A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling , 2014, ACM Trans. Math. Softw..

[14]  Z. Strakos,et al.  On error estimation in the conjugate gradient method and why it works in finite precision computations. , 2002 .

[15]  Alejandro Duran,et al.  A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks , 2009, International Journal of Parallel Programming.

[16]  Marcus J. Grote,et al.  Algebraic Multilevel Preconditioner for the Helmholtz Equation in Heterogeneous Media , 2009, SIAM J. Sci. Comput..

[17]  Enrique S. Quintana-Ortí,et al.  Exploiting thread-level parallelism in the iterative solution of sparse linear systems , 2011, Parallel Comput..