Provable algorithms for parallel generalized sweep scheduling

Abstract We present provably efficient parallel algorithms for sweep scheduling, which is a commonly used technique in radiation transport problems, and involves inverting an operator by iteratively sweeping across a mesh from multiple directions. Each sweep involves solving the operator locally at each cell. However, each direction induces a partial order in which this computation can proceed. On a distributed computing system, the goal is to schedule the computation, so that the length of the schedule is minimized. Due to efficiency and coupling considerations, we have an additional constraint, namely, a mesh cell must be processed on the same processor along each direction. Problems similar in nature to sweep scheduling arise in several other applications, and here we formulate a combinatorial generalization of this problem that captures the sweep scheduling constraints,and call it the generalized sweep scheduling problem. Several heuristics have been proposed for this problem; see [S. Pautz, An algorithm for parallel S n sweeps on unstructured meshes, Nucl. Sci. Eng. 140 (2002) 111–136; S. Plimpton, B. Hendrickson, S. Burns, W. McLendon, Parallel algorithms for radiation transport on unstructured grids, Super Comput. (2001)] and the references therein; but none of these have provable worst case performance guarantees. Here we present a simple, almost linear time randomized algorithm for the generalized sweep scheduling problem that (provably) gives a schedule of length at most O ( log 2 n ) times the optimal schedule for instances with n cells, when the communication cost is not considered, and a slight variant, which coupled with a much more careful analysis, gives a schedule of (expected) length O ( log m log log log m ) times the optimal schedule for m processors. These are the first such provable guarantees for this problem. The algorithm can be extended with an additional multiplicative factor in the case when we have inter-processor communication latency, in the models of Rayward-Smith [UET scheduling with inter-processor communication delays, Discrete Appl. Math. 18 (1) (1987) 55–71] and Hwang et al. [Scheduling precedence graphs in systems with inter-processor communication times, SIAM J. Comput. 18(2) (1989) 244–257]. Our algorithms are extremely simple, and use no geometric information about the mesh; therefore, these techniques are likely to be applicable in more general settings. We also design a priority based list schedule using these ideas, with the same theoretical guarantee, but much better performance in practice; combining this algorithm with a simple block decomposition also lowers the overall communication cost significantly. Finally, we perform a detailed experimental analysis of our algorithm. Our results indicate that the algorithm compares favorably with the length of the schedule produced by other natural and efficient parallel algorithms proposed in the literature [S. Pautz, An Algorithm for parallel S n sweeps on unstructured meshes, Nucl. Sci. Eng. 140 (2002) 111–136; S. Plimpton, B. Hendrickson, S. Burns, W. McLendon, Parallel algorithms for radiation transport on unstructured grids, Super Comput. (2001)].

[1]  Gaston H. Gonnet,et al.  Handbook of Algorithms , 1984 .

[2]  Steven J. Plimpton,et al.  Parallel Algorithms for Radiation Transport on Unstructured Grids , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[3]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[4]  M. E. Williams,et al.  TRANSIMS: TRANSPORTATION ANALYSIS AND SIMULATION SYSTEM , 1995 .

[5]  Jim E. Morel,et al.  ATTILA: A three-dimensional, unstructured tetrahedral mesh discrete ordinates transport code , 1996 .

[6]  Shawn D. Pautz,et al.  An Algorithm for Parallel Sn Sweeps on Unstructured Meshes , 2001 .

[7]  E.L. Lawler,et al.  Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey , 1977 .

[8]  D. Atkin OR scheduling algorithms. , 2000, Anesthesiology.

[9]  Michele Benzi,et al.  Preconditioning a mixed discontinuous finite element method for radiation diffusion , 2004, Numer. Linear Algebra Appl..

[10]  Kun-Lung Wu,et al.  The CHAMPS system: change management with planning and scheduling , 2004, 2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507).

[11]  Jim E. Morel,et al.  Krylov Iterative Methods and the Degraded Effectiveness of Diffusion Synthetic Acceleration for Multidimensional SN Calculations in Problems with Material Discontinuities , 2004 .

[12]  Eugene L. Lawler,et al.  Sequencing and scheduling: algorithms and complexity , 1989 .

[13]  Jan Karel Lenstra,et al.  Complexity of Scheduling under Precedence Constraints , 1978, Oper. Res..

[14]  Han Hoogeveen,et al.  Three, four, five, six, or the complexity of scheduling with communication delays , 1994, Oper. Res. Lett..

[15]  Madhav V. Marathe,et al.  Understanding Large-Scale Social and Infrastructure Networks: A Simulation-Based Approach , 2004 .

[16]  Aravind Srinivasan,et al.  Modelling disease outbreaks in realistic urban social networks , 2004, Nature.

[17]  Nancy M. Amato,et al.  Task Scheduling and Parallel Mesh-Sweeps in Transport Computations , 2000 .

[18]  Lawrence Rauchwerger,et al.  Parallel Sn Sweeps on Unstructured Grids: Algorithms for Prioritization, Grid Partitioning, and Cycle Detection , 2005 .

[19]  George L. Nemhauser,et al.  Handbooks in operations research and management science , 1989 .

[20]  Madhav V. Marathe,et al.  Modeling and simulation of large biological, information and socio-technical systems: an interaction based approach , 2006 .

[21]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[22]  Victor J. Rayward-Smith,et al.  UET scheduling with unit interprocessor communication delays , 1987, Discret. Appl. Math..

[23]  Philippe Chrétienne Task scheduling with interprocessor communication delays , 1992 .

[24]  Bruce M. Maggs,et al.  Packet routing and job-shop scheduling inO(congestion+dilation) steps , 1994, Comb..

[25]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[26]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[27]  Nancy M. Amato,et al.  A general performance model for parallel sweeps on orthogonal grids for particle transport calculations , 2000, ICS '00.