Structure driven multiprocessor compilation of numeric problems

The optimal automatic compilation of computation intensive numeric problems onto multiprocessors is of great current interest. While optimal compilation is NP-Hard in general, the extensive structure present in many numeric algorithms greatly facilitates their optimal compilation. This thesis explores the application of a hierarchical compilation paradigm for such algorithms. These algorithms can be specified as matrix expressions composed of matrix sums, products, and inverses, FFTs, etc. Good compilations for these algorithms can be derived by composing together good routines for these basic operators, thus yielding a hierarchical compilation strategy. The first part of this thesis we explore the use of the extensive structure present in matrix operations to derive close to optimal routines for them, thus creating a parallel library. We show that these operator routines vary in a smooth fashion over a space of parameterised architectures. We then present a theoretical framework for optimally composing together good library routines to generate a good compilation for the entire matrix expression dataflow graph. Classical scheduling theory is generalised for this purpose. Each operator in the expression is identified with a dynamic system (task). The state of a task represents the amount of computation completed. Computing the matrix expression is equivalent to traversing the state space from the initial uncomputed state to the final computed state, using the processor resources. This casts the problem in the framework of control theory. Fundamental new insights into multiprocessor scheduling can be obtained from this formulation. Optimal control theory is applied to identify time-optimal control strategies with optimal schedules. A number of powerful results can be derived under very general assumptions. For certain types of (convex) task dynamics, it is shown that optimal scheduling is equivalent to shortest path and flow problems. This leads to very simple strategies for scheduling dataflow graphs composed of such tasks. These strategies have been applied to scheduling matrix expressions. A compiler utilizing these techniques has been written, generating MUL-T code for the MIT Alewife machine, and the theory validated. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  P. Hammer,et al.  Discrete Applied Mathematics Volume 37-38 , 1992 .

[2]  M. Covell,et al.  An algorithm design environment for signal processing , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  Noga Alon,et al.  Covering a square by small perimeter rectangles , 1986, Discret. Comput. Geom..

[4]  M. C. Chen,et al.  Transformations of Parallel Programs in Crystal , 1986, IFIP Congress.

[5]  Michael Werman,et al.  The decomposition of a square into rectangles of minimal perimeter , 1987, Discret. Appl. Math..

[6]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[7]  Kwei-Jay Lin,et al.  Scheduling parallelizable jobs on multiprocessors , 1989, [1989] Proceedings. Real-Time Systems Symposium.

[8]  Monica Sin-Ling Lam,et al.  A Systolic Array Optimizing Compiler , 1989 .

[9]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[10]  Robert H. Halstead,et al.  Mul-T: a high-performance parallel Lisp , 1989, PLDI '89.

[11]  Utpal Banerjee,et al.  Speedup of ordinary programs , 1979 .

[12]  A. W. Roscoe,et al.  The Decomposition of a Rectangle into Rectangles of Minimal Perimeter , 1988, SIAM J. Comput..

[13]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[14]  Noga Alon,et al.  Partitioning a rectangle into small perimeter rectangles , 1992, Discret. Math..

[15]  Jacek Blazewicz,et al.  Scheduling Multiprocessor Tasks to Minimize Schedule Length , 1986, IEEE Transactions on Computers.

[16]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[17]  Cory S. Myers Signal representation for symbolic and numerical processing , 1986 .

[18]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[19]  Alexandru Nicolau,et al.  Parallel processing: a smart compiler and a dumb machine , 1984, SIGPLAN '84.

[20]  Joseph Y.-T. Leung,et al.  Complexity of Scheduling Parallel Task Systems , 1989, SIAM J. Discret. Math..

[21]  David Aaron Schwartz,et al.  Synchronous multiprocessor realizations of shift-invariant flow graphs , 1985 .

[22]  Marina C. Chen A parallel language and its compilation to multiprocessor machines or VLSI , 1986, POPL '86.

[23]  Thomas P. Barnwell,et al.  Optimal implementation of signal flow graphs on synchronous multiprocessors , 1982, ICPP.