A Knowledge-Based Operator for a Genetic Algorithm which Optimizes the Distribution of Sparse Matrix Data

We present the Hogs and Slackers genetic algorithm (GA) which addresses the problem of improving the parallelization efficiency of sparse matrix computations by optimally distributing blocks of matrices data. The performance of a distribution is sensitive to the non-zero patterns in the data, the algorithm, and the hardware architecture. In a candidate distributions the Hogs and Slackers GA identifies processors with many operations – hogs, and processors with fewer operations – slackers. Its intelligent operation-balancing mutation operator then swaps data blocks between hogs and slackers to explore a new data distribution.We show that the Hogs and Slackers GA performs better than a baseline GA. We demonstrate Hogs and Slackers GA’s optimization capability with an architecture study of varied network and memory bandwidth and latency.

[1]  Edmond Chow,et al.  A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[2]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[3]  T. Kalinowski Solving the mapping problem with a genetic algorithm on the MasPar-1 , 1994, Proceedings of the First International Conference on Massively Parallel Computing Systems (MPCS) The Challenges of General-Purpose and Special-Purpose Computing.

[4]  R. Bond,et al.  pMapper: Automatic Mapping of Parallel Matlab Programs , 2005, 2005 Users Group Conference (DOD-UGC'05).

[5]  I. Duff,et al.  Direct Methods for Sparse Matrices , 1987 .

[6]  Wen-Yang Lin Parallel sparse matrix ordering: quality improvement using genetic algorithms , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[7]  David J. Kuck,et al.  High Performance Computing: Challenges for Future Systems , 1996 .

[8]  John R. Gilbert,et al.  A Unified Framework for Numerical and Combinatorial Computing , 2008, Computing in Science & Engineering.

[9]  Hahn Kim,et al.  Technical Challenges of Supporting Interactive HPC , 2007, 2007 DoD High Performance Computing Modernization Program Users Group Conference.

[10]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[11]  Christine L. Mumford,et al.  Single vehicle pickup and delivery with time windows: made to measure genetic encoding and operators , 2007, GECCO '07.

[12]  Robert E. Tarjan,et al.  A Unified Approach to Path Problems , 1981, JACM.

[13]  Vijay V. Vazirani,et al.  Maximum Matchings in General Graphs Through Randomization , 1989, J. Algorithms.

[14]  Hamid R. Arabnia,et al.  Next Generation Sequence Analysis Using Genetic Algorithms on Multi-core Technology , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[15]  Alexandru Nicolau,et al.  R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks , 2007, Algorithmica.

[16]  Aguilar Jose An approach to mapping parallel programs on hypercube multiprocessors , 1999, Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99.

[17]  John R. Gilbert,et al.  Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.

[18]  A. Biriukov,et al.  Simulation of parallel time-critical programs with the DYNAMO system , 1994, 1994 Proceedings of IEEE International Conference on Control and Applications.

[19]  Narasimhan Sundararajan,et al.  Genetic algorithm based pattern allocation schemes for training set parallelism in backpropagation neural networks , 1995, Proceedings of 1995 IEEE International Conference on Evolutionary Computation.

[20]  Jeremy Kepner,et al.  'pMATLAB Parallel MATLAB Library' , 2007, Int. J. High Perform. Comput. Appl..

[21]  Allen Newell,et al.  A Universal Weak Method: Summary of Results , 1983, IJCAI.

[22]  E.-G. Talbi,et al.  Hill-climbing, simulated annealing and genetic algorithms: a comparative study and application to the mapping problem , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[23]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[24]  John R. Gilbert,et al.  New Ideas in Sparse Matrix Matrix Multiplication , 2011, Graph Algorithms in the Language of Linear Algebra.

[25]  N.T. Bliss,et al.  Performance Modeling and Mapping of Sparse Computations , 2008, 2008 DoD HPCMP Users Group Conference.

[26]  Raphael Yuster,et al.  Detecting short directed cycles using rectangular matrix multiplication and dynamic programming , 2004, SODA '04.

[27]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[28]  Rudolf Eigenmann,et al.  Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems , 2008, ICS '08.

[29]  Prithviraj Banerjee,et al.  Automatic generation of efficient array redistribution routines for distributed memory multicomputers , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[30]  Michele Colajanni,et al.  PSBLAS: a library for parallel linear algebra computation on sparse matrices , 2000, TOMS.

[31]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.