论文信息 - Fine-Grained Multithreading for the Multifrontal QR Factorization of Sparse Matrices

Fine-Grained Multithreading for the Multifrontal QR Factorization of Sparse Matrices

The advent of multicore processors represents a disruptive event in the history of computer science as conventional parallel programming paradigms are proving incapable of fully exploiting their potential for concurrent computations. The need for different or new programming models clearly arises from recent studies which identify fine-granularity and dynamic execution as the keys to achieving high efficiency on multicore systems. This work presents an approach to the parallelization of the multifrontal method for the $QR$ factorization of sparse matrices specifically designed for multicore based systems. High efficiency is achieved through a fine-grained partitioning of data and a dynamic scheduling of computational tasks relying on a dataflow parallel programming model. Experimental results show that an implementation of the proposed approach achieves higher performance and better scalability than existing equivalent software.

Alfredo Buttari | A. Buttari

[1] Jack Dongarra,et al. Numerical Linear Algebra for High-Performance Computers , 1998 .

[2] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.

[3] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[4] Pontus Matstoms,et al. Parallel Sparse QR Factorization on Shared Memory Architectures , 1995, Parallel Comput..

[5] Timothy A. Davis,et al. A column approximate minimum degree ordering algorithm , 2000, TOMS.

[6] Timothy A. Davis,et al. Algorithm 832: UMFPACK V4.3---an unsymmetric-pattern multifrontal method , 2004, TOMS.

[7] A. George,et al. Householder reflections versus givens rotations in sparse orthogonal decomposition , 1987 .

[8] A. George,et al. Solution of sparse linear least squares problems using givens rotations , 1980 .

[9] Jorge J. Moré,et al. Benchmarking optimization software with performance profiles , 2001, Math. Program..

[10] Jack J. Dongarra,et al. Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[11] Patrick Amestoy,et al. MUMPS : A General Purpose Distributed Memory Sparse Solver , 2000, PARA.