Fine-Grained Multithreading for the Multifrontal QR Factorization of Sparse Matrices

The advent of multicore processors represents a disruptive event in the history of computer science as conventional parallel programming paradigms are proving incapable of fully exploiting their potential for concurrent computations. The need for different or new programming models clearly arises from recent studies which identify fine-granularity and dynamic execution as the keys to achieving high efficiency on multicore systems. This work presents an approach to the parallelization of the multifrontal method for the $QR$ factorization of sparse matrices specifically designed for multicore based systems. High efficiency is achieved through a fine-grained partitioning of data and a dynamic scheduling of computational tasks relying on a dataflow parallel programming model. Experimental results show that an implementation of the proposed approach achieves higher performance and better scalability than existing equivalent software.

[1]  Jack Dongarra,et al.  Numerical Linear Algebra for High-Performance Computers , 1998 .

[2]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[3]  Julien Langou,et al.  A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[4]  Pontus Matstoms,et al.  Parallel Sparse QR Factorization on Shared Memory Architectures , 1995, Parallel Comput..

[5]  Timothy A. Davis,et al.  A column approximate minimum degree ordering algorithm , 2000, TOMS.

[6]  Timothy A. Davis,et al.  Algorithm 832: UMFPACK V4.3---an unsymmetric-pattern multifrontal method , 2004, TOMS.

[7]  A. George,et al.  Householder reflections versus givens rotations in sparse orthogonal decomposition , 1987 .

[8]  A. George,et al.  Solution of sparse linear least squares problems using givens rotations , 1980 .

[9]  Jorge J. Moré,et al.  Benchmarking optimization software with performance profiles , 2001, Math. Program..

[10]  Jack J. Dongarra,et al.  Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[11]  Patrick Amestoy,et al.  MUMPS : A General Purpose Distributed Memory Sparse Solver , 2000, PARA.

[12]  Emmanuel Agullo,et al.  Tile QR factorization with parallel panel processing for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[13]  Jennifer A. Scott,et al.  A DAG-based Sparse Cholesky Solver for Multicore Architectures , 2009 .

[14]  Timothy A. Davis,et al.  Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rank-revealing sparse QR factorization , 2011, TOMS.

[15]  Alfredo Buttari,et al.  Fine Granularity Sparse QR Factorization for Multicore Based Systems , 2010, PARA.

[16]  Patrick Amestoy,et al.  Multifrontal QR Factorization in a Multiprocessor Environment , 1996, Numer. Linear Algebra Appl..

[17]  Jack Dongarra,et al.  Enhancing Parallelism of Tile QR Factorization for Multicore Architectures , 2010 .

[18]  George Ho,et al.  PAPI: A Portable Interface to Hardware Performance Counters , 1999 .

[19]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.

[20]  Timothy A. Davis,et al.  Multifrontral multithreaded rank-revealing sparse QR factorization , 2009, Combinatorial Scientific Computing.

[21]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[22]  Robert Schreiber,et al.  A New Implementation of Sparse Gaussian Elimination , 1982, TOMS.

[23]  John R. Rice,et al.  PARVEC Workshop on Very Large Least Squares Problems and Supercomputers , 1983 .

[24]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[25]  C. Loan,et al.  A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .

[26]  Theo Ungerer,et al.  Asynchrony in Parallel Computing: From Dataflow to Multithreading , 2001, Scalable Comput. Pract. Exp..

[27]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[28]  Joseph W. H. Liu The role of elimination trees in sparse factorization , 1990 .

[29]  Joseph W. H. Liu,et al.  On the storage requirement in the out-of-core multifrontal method for sparse factorization , 1986, TOMS.

[30]  Jean Roman,et al.  SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs , 1996, HPCN Europe.

[31]  Joseph W. H. Liu On general row merging schemes for sparse given transformations , 1986 .

[32]  Guillaume Mercier,et al.  hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[33]  Gil Utard,et al.  Impact of reordering on the memory of a multifrontal solver , 2003, Parallel Comput..