Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing

In this study, a global optimization meta-heuristic is developed for the problem of determining the optimum data distribution and degree of parallelism in parallelizing a sequential program for distributed memory machines. The parallel program is considered as the union of consecutive stages and the method deals with all the stages in the entire program rather than proposing solutions for each stage. The meta-heuristic developed here for this specific problem combines simulated annealing and hill climbing (SA-HC) in the search for the optimum configuration. Performance is tested in terms of the total execution time of the program including communication and computation times. Two exemplary codes from the literature, the first being computation intensive and the second being communication intensive, are utilized in the experiments. The performance of the SA-HC algorithm provides satisfactory results for these illustrative examples.

[1]  P. Sadayappan,et al.  Communication-Free Hyperplane Partitioning of Nested Loops , 1991, LCPC.

[2]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[3]  K. Dowsland Some experiments with simulated annealing techniques for packing problems , 1993 .

[4]  Christos Koulamas,et al.  A survey of simulated annealing applications to operations research problems , 1994 .

[5]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[6]  Joseph Mohan Performance of parallel programs , 1984 .

[7]  Thomas Rauber,et al.  Deriving Array Distributions by Optimization Techniques , 2004, The Journal of Supercomputing.

[8]  John A. Chandy,et al.  The Paradigm Compiler for Distributed-Memory Multicomputers , 1995, Computer.

[9]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[10]  Linet Özdamar,et al.  Simultaneous lot sizing and loading of product families on parallel facilities of different classes , 1998 .

[11]  J. Ramanujam,et al.  Compile-Time Techniques for Data Distribution in Distributed Memory Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[12]  Geoffrey C. Fox,et al.  An Automatic and Symbolic Parallelization System for Distributed Memory Parallel Computers , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[13]  Prithviraj Banerjee,et al.  Compiler techniques for optimizing communication and data distribution for distributed-memory multicomputers , 1996 .

[14]  Jaeyoung Choi,et al.  The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form , 1995, Numerical Algorithms.

[15]  Jang-Ping Sheu,et al.  Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers , 1994, IEEE Trans. Parallel Distributed Syst..

[16]  V. Joseph Subash Mohan,et al.  Performance of parallel programs: model and analyses , 1984 .

[17]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[18]  PeiZong Lee Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers , 1997, IEEE Trans. Parallel Distributed Syst..

[19]  Jan Karel Lenstra,et al.  A local search template , 1998, Comput. Oper. Res..

[20]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[21]  Yakup Paker,et al.  A comparative workload-based methodology for performance evaluation of parallel computers , 1997, Future Gener. Comput. Syst..

[22]  Ken Kennedy,et al.  Automatic Data Layout Using 0-1 Integer Programming , 1994, IFIP PACT.

[23]  Mary E. Mace Memory storage patterns in parallel processing , 1987, The Kluwer international series in engineering and computer science.

[24]  Alan H. Karp,et al.  Programming for Parallelism , 1987, Computer.

[25]  Skef Wholey Automatic data mapping for distributed-memory parallel computers , 1992, ICS '92.

[26]  Guy L. Steele,et al.  Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines , 1990, J. Parallel Distributed Comput..