Software support for parallel processing of irregular and dynamic computations

Many real world scientific computations are irregular and dynamic, which pose great challenge to the effort of parallelization. In this thesis we study the efficient mapping of a subclass of these problems, namely the "stepwise slowly changing" problems, onto distributed memory multiprocessors using the task graph scheduling approach. There exists a large class of applications which belong to this category. Intuitively, the irregularity requires sophisticated mapping algorithms, and the "slowness" in the changes of the computational structures between steps allows the scheduling cost to be amortized, justifying the approach. We study three representative and widely-used applications: The N-body simulation in astrophysics, the Vortex-Sheet Roll-Up and the Contour Dynamics Computation from Computational Fluid Dynamics. We start with an initial global compile-time scheduling, and apply new rescheduling algorithms to improve performance when this schedule degenerates over the iterative process. We develop rescheduling algorithms for two important dynamic patterns: task graph weight variation, and dynamic spawning of new subgraphs. These algorithms are tested on random graphs and real applications such as the FMM N-body and Vortex Sheet. Our experiments show that global scheduling using sophisticated methods can be beneficial for these problems, and our fast rescheduling algorithms can correct run-time imbalance with very low cost. In summary, we discuss several central issues such as schedule reuse, performance/overhead trade-off and the selection of rescheduling methods. We identify classes of problems where rescheduling algorithms are applicable, and present experimental evidence to justify our approach. Throughout the thesis, performance results are obtained from particular problems but presented in the general framework of software support systems. In addition, we examine an automatic task graph generation tool that can handle restricted cases of sequential code, and carry out its integration with our scheduling system. The new system is capable of realizing automatic parallelization of simple programs, a step forward to the grand challenge of fully automatic parallelization of regular and irregular code.

[1]  Vivek Sarkar,et al.  Mapping Iterative Task Graphs on Distributed Memory Machines , 1995, ICPP.

[2]  Anthony P. Reeves,et al.  Strategies for Dynamic Load Balancing on Highly Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[3]  Tao Yang,et al.  PYRROS: static task scheduling and code generation for message passing multiprocessors , 1992 .

[4]  Michel Cosnard,et al.  Automatic task graph generation techniques , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[5]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[6]  Anoop Gupta,et al.  A parallel adaptive fast multipole method , 1993, Supercomputing '93. Proceedings.

[7]  Vivek Sarkar,et al.  Partitioning and scheduling parallel programs for execution on multiprocessors , 1987 .

[8]  Tao Yang,et al.  List Scheduling With and Without Communication Delays , 1993, Parallel Comput..

[9]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[10]  Jong-Deok Choi,et al.  Global communication analysis and optimization , 1996, PLDI '96.

[11]  Jack J. Dongarra,et al.  The PVM Concurrent Computing System: Evolution, Experiences, and Trends , 1994, Parallel Comput..

[12]  James C. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[13]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[14]  Tzong-Jer Yang,et al.  A comparison of clustering heuristics for scheduling dags on multiprocessors , 1990 .

[15]  Monica S. Lam,et al.  Interprocedural Analysis for Parallelization , 1995, LCPC.

[16]  L. Greengard The Rapid Evaluation of Potential Fields in Particle Systems , 1988 .

[17]  John A. Chandy,et al.  The Paradigm Compiler for Distributed-Memory Multicomputers , 1995, Computer.

[18]  Pangfeng Liu,et al.  Tree codes for vortex dynamics: Application of a programming framework , 1995 .

[19]  G. N. Srinivasa Prasanna,et al.  Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory , 1994, IEEE Trans. Parallel Distributed Syst..

[20]  Saman Amarasinghe,et al.  The suif compiler for scalable parallel machines , 1995 .

[21]  David G. Dritschel A fast contour dynamics method for many‐vortex calculations in two‐dimensional flows , 1993 .

[22]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[23]  David A. Case,et al.  Computer simulations of protein dynamics and thermodynamics , 1993, Computer.

[24]  Tao Yang,et al.  A Multistage Approach for Scheduling Task Graphs on Parallel Machines , 1994, Parallel Processing of Discrete Optimization Problems.

[25]  Kleanthis Psarris Linear Time Extract Methods for Data Dependence Analysis in Practice , 1995, ICPP.

[26]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[27]  Vikram S. Adve,et al.  Requirements for Data-Parallel Programming Environments , 1994 .

[28]  Bongki Moon,et al.  Runtime Support to Parallelize Adaptive Irregular Programs , 1994 .

[29]  David W. Walker,et al.  The Design of a Standard Message Passing Interface for Distributed Memory Concurrent Computers , 1994, Parallel Comput..

[30]  Vijay K. Naik,et al.  SHAPE: A Parallelization Tool for Sparse Matrix Computations , 1992 .

[31]  J. Ortega Introduction to Parallel and Vector Solution of Linear Systems , 1988, Frontiers of Computer Science.

[32]  Tao Yang,et al.  Scheduling and code generation for parallel architectures , 1993 .

[33]  Michael Wolfe,et al.  The Tiny Loop Restructuring Research Tool , 1991, ICPP.

[34]  John K. Salmon,et al.  Parallel hierarchical N-body methods , 1992 .

[35]  Carolyn McCreary,et al.  Automatic determination of grain size for efficient parallel processing , 1989, CSC '89.

[36]  Y. Saad,et al.  Gaussian elimination on hypercubes , 1986 .

[37]  Shahid H. Bokhari,et al.  On the Mapping Problem , 1981, IEEE Transactions on Computers.

[38]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[39]  Reinhard von Hanxleden,et al.  Compiler support for machine-independent parallelization of irregular problems , 1994, Rice COMP TR.

[40]  Milind Girkar,et al.  Automatic Extraction of Functional Parallelism from Ordinary Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[41]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[42]  Vasanth Balasundaram A Mechanism for Keeping Useful Internal Information in Parallel Programming Tools: The Data Access Descriptor , 1990, J. Parallel Distributed Comput..

[43]  Robert,et al.  Parallel Sparse Triangular Solution with Partitioned Inverses andPrescheduled , 1995 .

[44]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[45]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[46]  Tao Yang,et al.  Scheduling Of Structured and Unstructured computation , 1994, Interconnection Networks and Mapping and Scheduling Parallel Computations.

[47]  Panos M. Pardalos,et al.  A local search algorithm for the quadratic assignment problem , 1992 .

[48]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.

[49]  Johan De Keyser,et al.  Load Balancing Data Parallel Programs on Distributed Memory Computers , 1993, Parallel Comput..

[50]  Jaswinder Pal Singh,et al.  Hierarchical n-body methods and their implications for multiprocessors , 1993 .

[51]  Robert Krasny,et al.  Computation of vortex sheet roll-up in the Trefftz plane , 1987, Journal of Fluid Mechanics.