A Novel Parallel Scan for Multicore Processors and Its Application in Sparse Matrix-Vector Multiplication

We present a novel parallel algorithm for computing the scan operations on x86 multicore processors. The existing best known parallel scan for the same platform requires the number of processors to be a power of two. But this constraint is removed from our proposed method. In the design of the algorithm architectural considerations for x86 multicore processors are given so that the rate of cache misses is reduced and the cost of thread synchronization and management is minimized. Results from tests made on a machine with dual-socket \times quad-core Intel Xeon E5405 showed that the proposed solution outperformed the best known parallel reference. A novel approach to sparse matrix-vector multiplication (SpMV) based on the proposed scan is then explained. The approach, unlike the existing ones that make use of backward segmented operations, uses forward ones for more efficient caching. An implementation of the proposed SpMV was tested against the SpMV in Intel's Math Kernel Library (MKL) and merits were found in the proposed approach.

[1]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language (Version 2.6) , 1993 .

[2]  Guy E. Blelloch,et al.  Implementation of a portable nested data-parallel language , 1993, PPOPP '93.

[3]  Guy E. Blelloch,et al.  Prefix sums and their applications , 1990 .

[4]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[5]  Shubhabrata Sengupta,et al.  Efficient Parallel Scan Algorithms for GPUs , 2011 .

[6]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[7]  David R. O'Hallaron,et al.  Computer Systems: A Programmer's Perspective , 1991 .

[8]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[9]  Nan Zhang A novel parallel prefix sum algorithm and its implementation on multi-core platforms , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[10]  Marcin Dabrowski,et al.  Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs , 2010, Parallel Comput..

[11]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[12]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[13]  Kenneth E. Iverson,et al.  A programming language , 1899, AIEE-IRE '62 (Spring).

[14]  Guy E. Blelloch,et al.  AD-A 270 601 Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors , 1993 .

[15]  John D. Owens,et al.  A Work-Efficient Step-Efficient Prefix Sum Algorithm , 2006 .

[16]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[17]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[18]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.