Performance Modeling and Mapping of Sparse Computations

In the past, knowledge processing (anomaly detection, target identification, social network analysis) of sensor data did not require real-time processing speeds. However, the rapid growth in the size of the data and the shortening time scale of the required data analysis are driving the need for applications that provide real-time signal and knowledge processing at the sensor front end. Many knowledge processing techniques, such as Bayesian networks, social networks, and neural networks, have a graph abstraction. Graph algorithms are difficult to parallelize and thus cannot take advantage of multi-core architectures. Many graph operations can be cast as sparse linear algebra operations. While this increases the ease of programming, parallel sparse algorithms are still inefficient. This paper presents a search-based mapping and routing approach for sparse operations. Since finding well-performing maps and routes for sparse operations is a computationally intensive task, the mapping and routing algorithms have been parallelized to take advantage of the Lincoln Laboratory cluster computing capability, LLGrid. Our parallelization of the approach yielded near linear speed up and the mapping and routing results demonstrate over an order of magnitude performance improvement over traditional mapping techniques.

[1]  Jeremy Kepner,et al.  'pMATLAB Parallel MATLAB Library' , 2007, Int. J. High Perform. Comput. Appl..

[2]  David B. Loveman High performance Fortran , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[3]  Prithviraj Banerjee,et al.  Automatic generation of efficient array redistribution routines for distributed memory multicomputers , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[4]  Hahn Kim,et al.  Technical Challenges of Supporting Interactive HPC , 2007, 2007 DoD High Performance Computing Modernization Program Users Group Conference.

[5]  R. Bond,et al.  pMapper: Automatic Mapping of Parallel Matlab Programs , 2005, 2005 Users Group Conference (DOD-UGC'05).

[6]  N. Bliss,et al.  PVTOL: Providing Productivity, Performance and Portability to DoD Signal Processing Applications on Multicore Processors , 2008, 2008 DoD HPCMP Users Group Conference.

[8]  Cecelia DeLuca,et al.  A Portable, Object-Based Parallel Library and Layered Framework for Real-Time Radar Signal Processing , 1997, ISCOPE.

[9]  James Demmel,et al.  When cache blocking of sparse matrix vector multiply works and why , 2007, Applicable Algebra in Engineering, Communication and Computing.

[10]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[11]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[12]  Henry Hoffmann,et al.  Parallel VSIPL++: An Open Standard Software Library for High-Performance Parallel Signal Processing , 2005, Proceedings of the IEEE.

[13]  James Demmel,et al.  Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[14]  Richard Vuduc,et al.  Automatic performance tuning of sparse matrix kernels , 2003 .