Comparative analysis of approaches to hardware acceleration for sparse-matrix factorization

The authors compare two standard approaches to sparse LU (lower-upper) factorization, namely the compiled-code approach and the scatter-gather approach, with respect to three criteria that are relevant in the context of multiprocessor hardware acceleration: idealized parallelism, memory access costs, and storage requirements. The compiled-code approach was shown to be the clear winner with respect to the first metric, while the scatter-gather approach had much lower memory access cost and storage requirements. The use of a data structure in which rows of the sparse matrix are stored in an overlapped fashion along with the representation of a row-level operation as a single task was then proposed as a good compromise solution. The idealized parallelism with this approach was shown to be between that of the previous two approaches; its memory access cost was the same as with the scatter-gather approach, while its storage requirement was seen to be only moderately worse.<<ETX>>

[1]  Iain S. Duff,et al.  Direct methods for sparse matrices27100 , 1986 .

[2]  P. Sadayappan,et al.  Circuit Simulation on Shared-Memory Multiprocessors , 1988, IEEE Trans. Computers.

[3]  Robert E. Tarjan,et al.  Storing a sparse table , 1979, CACM.

[4]  Omar Wing,et al.  A Computation Model of Parallel Solution of Linear Equations , 1980, IEEE Transactions on Computers.