Automatic processor lower bound formulas for array computations

In the directed acyclic graph (dag) model of algorithms, consider the following problem for precedence-constrained multiprocessor schedules for array computations: Given a sequence of dags and linear schedules parameterized by n, compute a lower bound on the number of processors required by the schedule as a function of n. This problem is formulated so that the number of tasks that are scheduled for execution during any fixed time step is the number of non-negative integer solutions d/sub n/ to a set of parametric linear Diophantine equations. Generating function methods are then used for constructing a formula for the numbers dn. We implemented this algorithm as a Mathematica program. This paper is an overview of the techniques involved and their applications to well-known schedules for Matrix-Vector Product, Triangular Matrix Product, and Gaussian Elimination dags. Some example runs and automatically produced symbolic formulas for processor lower bounds by the algorithm are given.

[1]  Philippe Clauss,et al.  Calculus of space-optimal mappings of systolic algorithms on processor arrays , 1990, J. VLSI Signal Process..

[2]  Yves Robert,et al.  Linear scheduling is close to optimality , 1992, [1992] Proceedings of the International Conference on Application Specific Array Processors.

[3]  Peter R. Cappello,et al.  A Processor-Time-Minimal Systolic Array for Cubical Mesh Algorithms , 1992, IEEE Trans. Parallel Distributed Syst..

[4]  Jean-Marc Delosme,et al.  Space-optimal linear processor allocation for systolic arrays synthesis , 1992, Proceedings Sixth International Parallel Processing Symposium.

[5]  P. Quinton Automatic synthesis of systolic arrays from uniform recurrent equations , 1984, ISCA 1984.

[6]  C. Scheiman Mapping fundamental algorithms onto multiprocessor architectures , 1994 .

[7]  Sartaj Sahni,et al.  Computationally Related Problems , 1974, SIAM J. Comput..

[8]  D.I. Moldovan,et al.  On the design of algorithms for VLSI systolic arrays , 1983, Proceedings of the IEEE.

[9]  Yves Robert,et al.  Spacetime-minimal systolic architectures for Gaussian elimination and the algebraic path problem , 1990, [1990] Proceedings of the International Conference on Application Specific Array Processors.

[10]  Peter R. Cappello,et al.  Processor lower bound formulas for array computations and parametric Diophantine systems , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[11]  Sanjay V. Rajopadhye,et al.  On Synthesizing Systolic Arrays from Recurrence Equations with Linear Dependencies , 1986, FSTTCS.

[12]  C. Mongenet,et al.  Calculus of space-optimal mappings of systolic algorithms on processor arrays , 1990, [1990] Proceedings of the International Conference on Application Specific Array Processors.

[13]  Patrice Quinton Automatic synthesis of systolic arrays from uniform recurrent equations , 1984, ISCA '84.

[14]  Dan I. Moldovan,et al.  Parallelism detection and transformation techniques useful for VLSI algorithms , 1985, J. Parallel Distributed Comput..

[15]  B. W. Wah,et al.  Systematic design approaches for algorithmically specified systolic arrays , 1988 .

[16]  Richard M. Karp,et al.  The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.

[17]  Weijia Shang,et al.  Time Optimal Linear Schedules for Algorithms with Uniform Dependencies , 1991, IEEE Trans. Computers.

[18]  Chris J. Scheiman,et al.  PROCESSOR-TIME-OPTIMAL SYSTOLIC ARRAYS , 2000, Parallel Algorithms Appl..

[19]  M. Tcheunte,et al.  An optimal solution for Gauss-Jordan elimination of 2D systolic arrays , 1990 .

[20]  Chris J. Scheiman,et al.  A Processor-Time-Minimal Systolic Array for Transitive Closure , 1992, IEEE Trans. Parallel Distributed Syst..

[21]  Chris J. Scheiman,et al.  A Period-Processor-Time-Minimal Schedule for Cubical Mesh Algorithms , 1994, IEEE Trans. Parallel Distributed Syst..

[22]  Peter R. Cappello,et al.  Unifying VLSI Array Designs with Geometric Transformations , 1983, International Conference on Parallel Processing.

[23]  Chris J. Scheiman,et al.  A processor-time-minimal schedule for 3D rectilinear mesh algorithms , 1995, Proceedings The International Conference on Application Specific Array Processors.