The subspace model: shape-based compilation for parallel systems

ion. These are based on the results of the subspace and expansion analyses. If included, they must precede intermediate processing since one e ect of these optimizations is to alter the shape of the expression trees. The Back-end might include target-speci c transformations to improve parallelism for a speci c architecture, target-speci c analyses such as data layout, code layout, and VLIW scheduling, and a full code generator. In fact, the goal is a set of Backends for a set of targets. However, for this thesis the Back-end generates Connection Machine Fortran, based on Fortran 90, to run on the CM-5. The subspace compiler does not strictly include the Back-end and nothing relevant to the subspace model occurs there so it will not be discussed further in the thesis. Chapter 7 presents some experimental results. Chapter 8 defends our primary and secondary claims. In defense of the primary claim, this chapter compares the subspace model with a variety of existing techniques. In the process, this chapter constitutes a discussion of related works as well. Chapter 9 addresses future possibilities uncovered by the subspace model. Chapter 10 concludes. Appendix A summarizes the compilation phases presented. Appendix B describes the current status of the implentation. Appendix C explains why the approach presented is overly conservative in some cases and how this could be remedied. Appendix D is a glossary of terms. 34 Natural Subspace Natural Expansion Front-end Analysis Analysis Optimizations [optional] Intermediates Restructure Back-end Figure 1-4: Compiler Design 35

[1]  Monica S. Lam,et al.  Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[2]  Carl D. Offner A data structure for managing parallel operations , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[3]  D. Callahan,et al.  Recognizing and Parallelizing Bounded Recurrences , 1991, LCPC.

[4]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[5]  Ii C. D. Callahan A global approach to detection of parallelism , 1987 .

[6]  Ken Kennedy,et al.  Parascope:a Parallel Programming Environment , 1988 .

[7]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[8]  Monica S. Lam,et al.  Array-data flow analysis and its use in array privatization , 1993, POPL '93.

[9]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[10]  Leonidas J. Guibas,et al.  Compilation and delayed evaluation in APL , 1978, POPL.

[11]  Piyush Mehrotra,et al.  Vienna Fortran—a Fortran language extension for distributed memory multiprocessors , 1992 .

[12]  FerranteJeanne,et al.  The program dependence graph and its use in optimization , 1987 .

[13]  Wayne L. Winston Introduction to Mathematical Programming: Applications and Algorithms , 1990 .

[14]  Tom Blank,et al.  The MasPar MP-1 architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[15]  Rajeev Barua,et al.  Communication-Minimal Partitioning of Parallel Loops and Data Arrays for Cache-Coherent Distributed-Memory Multiprocessors , 1996, LCPC.

[16]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[17]  John Reid On PCF parallel Fortran extensions , 1992, FORF.

[18]  John R. Gilbert,et al.  Automatic array alignment in data-parallel programs , 1993, POPL '93.

[19]  J. Ramanujam,et al.  Compile-Time Techniques for Data Distribution in Distributed Memory Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[20]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[21]  C. William Gear,et al.  High speed compilation of efficient object code , 1965, Commun. ACM.

[22]  Guy L. Steele,et al.  Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines , 1990, J. Parallel Distributed Comput..

[23]  Alfred V. Aho,et al.  Principles of Compiler Design , 1977 .

[24]  Chau-Wen Tseng An optimizing Fortran D compiler for MIMD distributed-memory machines , 1993 .

[25]  Ken Kennedy,et al.  An Overview of the Fortran D Programming System , 1991, LCPC.

[26]  S LamMonica,et al.  Communication optimization and code generation for distributed memory machines , 1993 .

[27]  David A. Padua,et al.  Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs , 1991, LCPC.

[28]  David A. Padua,et al.  Array privatization for shared and distributed memory machines (extended abstract) , 1993, SIGP.

[29]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[30]  Anant Agarwal,et al.  Automatic Partitioning of Parallel Loops for Cache-Coherent Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[31]  Santosh G. Abraham,et al.  Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic , 1991, IEEE Trans. Parallel Distributed Syst..

[32]  Ron Cytron,et al.  What's In a Name? -or- The Value of Renaming for Parallelism Detection and Storage Allocation , 1987, ICPP.

[33]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[34]  K. Kennedy,et al.  Automatic Data Layout for High Performance Fortran , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[35]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[36]  Ken Kennedy,et al.  GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.