Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems

For multiprocessor systems with two level memory hierarchy; the communication requirements of parallel Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The data traffic associated with computing the Chloesky factor of an <italic>n</italic>n ×<italic>n</italic> dense matrix using <italic>n</italic><supscrpt>α</supscrpt> processors, α ≤ 2, is shown to be &OHgr;(<italic>n</italic><supscrpt>2+α/2</supscrpt>), assuming that the computational load is uniformly distributed. For an <italic>n</italic>n ×<italic>n</italic> sparse matrix, representing a √<italic>n</italic> × √<italic>n</italic> regular grid graph, the corresponding data traffic is shown to be &OHgr;(<italic>n</italic><supscrpt>1+α/2</supscrpt>), α ≤ 1. Partitioning schemes that are variations of block assignment scheme are described. The data traffic generated by these schemes are asymptotically optimal and these schemes allow efficient use of up to <italic>&Ogr;</italic>(<italic>n</italic><supscrpt>2</supscrpt>) and &Ogr;(<italic>n</italic>) processors in the dense and the sparse case, respectively. The block based partitioning schemes are shown to provide a better utilization of the data accessed from the shared memory and reduce the total data traffic as compared to the schemes based on the column-wise wrap around assignment.

[1]  高等学校計算数学学報編輯委員会編 高等学校計算数学学報 = Numerical mathematics , 1979 .

[2]  Michael T. Heath,et al.  Sparse Cholesky factorization on a local-memory multiprocessor , 1988 .

[3]  Michael T. Heath,et al.  Symbolic Cholesky factorization on a local-memory multiprocessor , 1987, Parallel Comput..

[4]  Vijay K. Naik,et al.  On the computation and communication tradeoffs and their impact on the performance of asynchronous multiprocessor systems , 1988 .

[5]  R. Schreiber,et al.  Nested dissection on a mesh-connected processor array , 1985 .

[6]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[7]  D. Rose,et al.  Generalized nested dissection , 1977 .

[8]  Joseph W. H. Liu,et al.  Computational models and task scheduling for parallel sparse Cholesky factorization , 1986, Parallel Comput..

[9]  John R. Gilbert,et al.  A Parallel Algorithm for Large Sparse Cholesky Factorization on a Multiprocessor , 1986 .

[10]  R. Tarjan,et al.  A Separator Theorem for Planar Graphs , 1977 .

[11]  Vijay K. Naik,et al.  Communication Requirements of Sparse Cholesky Factorization with Nested Dissection Ordering , 1987, PPSC.

[12]  Alan George,et al.  Computer Solution of Large Sparse Positive Definite , 1981 .

[13]  A. George,et al.  Parallel Cholesky factorization on a shared-memory multiprocessor. Final report, 1 October 1986-30 September 1987 , 1986 .

[14]  John R. Gilbert,et al.  A parallel graph partitioning algorithm for a message-passing multiprocessor , 1987, International journal of parallel programming.

[15]  R. Tarjan,et al.  The analysis of a nested dissection algorithm , 1987 .