Hogs and slackers: Using operations balance in a genetic algorithm to optimize sparse algebra computation on distributed architectures

We present a framework for optimizing the distributed performance of sparse matrix computations. These computations are optimally parallelized by distributing their operations across processors in a subtly uneven balance. Because the optimal balance point depends on the non-zero patterns in the data, the algorithm, and the underlying hardware architecture, it is difficult to determine. The Hogs and Slackers genetic algorithm (GA) identifies processors with many operations -hogs, and processors with few operations -slackers. Its intelligent operation-balancing mutation operator swaps data blocks between hogs and slackers to explore new balance points. We show that this operator is integral to the performance of the genetic algorithm and use the framework to conduct an architecture study that varies network specifications. The Hogs and Slackers GA is itself a parallel algorithm with near linear speedup on a large computing cluster.

[1]  Christine L. Mumford,et al.  Single vehicle pickup and delivery with time windows: made to measure genetic encoding and operators , 2007, GECCO '07.

[2]  Robert E. Tarjan,et al.  A Unified Approach to Path Problems , 1981, JACM.

[3]  Raphael Yuster,et al.  Detecting short directed cycles using rectangular matrix multiplication and dynamic programming , 2004, SODA '04.

[4]  A. Biriukov,et al.  Simulation of parallel time-critical programs with the DYNAMO system , 1994, 1994 Proceedings of IEEE International Conference on Control and Applications.

[5]  R. Bond,et al.  pMapper: Automatic Mapping of Parallel Matlab Programs , 2005, 2005 Users Group Conference (DOD-UGC'05).

[6]  Narasimhan Sundararajan,et al.  Genetic algorithm based pattern allocation schemes for training set parallelism in backpropagation neural networks , 1995, Proceedings of 1995 IEEE International Conference on Evolutionary Computation.

[7]  E.-G. Talbi,et al.  Hill-climbing, simulated annealing and genetic algorithms: a comparative study and application to the mapping problem , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[8]  T. Kalinowski Solving the mapping problem with a genetic algorithm on the MasPar-1 , 1994, Proceedings of the First International Conference on Massively Parallel Computing Systems (MPCS) The Challenges of General-Purpose and Special-Purpose Computing.

[9]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[10]  Hahn Kim,et al.  Technical Challenges of Supporting Interactive HPC , 2007, 2007 DoD High Performance Computing Modernization Program Users Group Conference.

[11]  John R. Gilbert,et al.  New Ideas in Sparse Matrix Matrix Multiplication , 2011, Graph Algorithms in the Language of Linear Algebra.

[12]  Rudolf Eigenmann,et al.  Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems , 2008, ICS '08.

[13]  Vijay V. Vazirani,et al.  Maximum Matchings in General Graphs Through Randomization , 1989, J. Algorithms.

[14]  Hamid R. Arabnia,et al.  Next Generation Sequence Analysis Using Genetic Algorithms on Multi-core Technology , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[15]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[16]  Lawrence Davis,et al.  Genetic Algorithms and Simulated Annealing , 1987 .

[17]  Prithviraj Banerjee,et al.  Automatic generation of efficient array redistribution routines for distributed memory multicomputers , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[18]  Michele Colajanni,et al.  PSBLAS: a library for parallel linear algebra computation on sparse matrices , 2000, TOMS.

[19]  Jeremy Kepner,et al.  'pMATLAB Parallel MATLAB Library' , 2007, Int. J. High Perform. Comput. Appl..

[20]  Alexandru Nicolau,et al.  R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks , 2007, Algorithmica.

[21]  Patricia J. Teller,et al.  Proceedings of the 2008 ACM/IEEE conference on Supercomputing , 2008, HiPC 2008.

[22]  Aguilar Jose An approach to mapping parallel programs on hypercube multiprocessors , 1999, Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99.

[23]  John R. Gilbert,et al.  Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.

[24]  N.T. Bliss,et al.  Performance Modeling and Mapping of Sparse Computations , 2008, 2008 DoD HPCMP Users Group Conference.

[25]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[26]  Wen-Yang Lin Parallel sparse matrix ordering: quality improvement using genetic algorithms , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[27]  John R. Gilbert,et al.  A Unified Framework for Numerical and Combinatorial Computing , 2008, Computing in Science & Engineering.

[28]  Edmond Chow,et al.  A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).