Compiler transformations for high-performance computing

In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis. This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each transformation, explain how to determine if it is legal, and give an example of its application. Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimizations that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as bibliographic reference for the details of each optimization. Readers are expected to be familiar with modern computer architecture and basic program compilation techniques.

[1]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[2]  Steven Mark Carr,et al.  Memory-hierarchy management , 1993 .

[3]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[4]  Ko-Yang Wang Intelligent program optimization and parallelization for parallel computers , 1991 .

[5]  John Banning,et al.  : An Efficient , 2022 .

[6]  Gerald J. Sussman,et al.  Structure and interpretation of computer programs , 1985, Proceedings of the IEEE.

[7]  B J Smith,et al.  A pipelined, shared resource MIMD computer , 1986 .

[8]  Richard M. Russell,et al.  The CRAY-1 computer system , 1978, CACM.

[9]  Dorothy Wedel,et al.  Fortran for the Texas Instruments ASC system , 1975, Programming Languages and Compilers for Parallel and Vector Machines.

[10]  Karl J. Ottenstein,et al.  The program dependence graph in a software development environment , 1984, SDE 1.

[11]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[12]  William D. Clinger How to Read Floating-Point Numbers Accurately , 1990, PLDI.

[13]  Edward G. Coffman,et al.  A study of interleaved memory systems , 1970, AFIPS '70 (Spring).

[14]  Ron Cytron,et al.  Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[15]  David J. Kuck,et al.  The Burroughs Scientific Processor (BSP) , 1982, IEEE Transactions on Computers.

[16]  Vivek Sarkar,et al.  A general framework for iteration-reordering loop transformations , 1992, PLDI '92.

[17]  J A Fisher,et al.  Instruction-Level Parallel Processing , 1991, Science.

[18]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[19]  Ron Cytron,et al.  What's In a Name? -or- The Value of Renaming for Parallelism Detection and Storage Allocation , 1987, ICPP.

[20]  Guy L. Steele,et al.  Arithmetic shifting considered harmful , 1977, SIGP.

[21]  Gurindar S. Sohi,et al.  Instruction Issue Logic for High-Performance Interruptible, Multiple Functional Unit, Pipelines Computers , 1990, IEEE Trans. Computers.

[22]  Christopher Eoyang,et al.  A comparison study of automatically vectorizing Fortran compilers , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[23]  DONALD MICHIE,et al.  “Memo” Functions and Machine Learning , 1968, Nature.

[24]  Mary E. Mace Memory storage patterns in parallel processing , 1987, The Kluwer international series in engineering and computer science.

[25]  Constantine D. Polychronopoulos,et al.  Compiling issues for supercomputers , 1988, SC.

[26]  John R. Gilbert,et al.  The Alignment-Distribution Graph , 1993, LCPC.

[27]  David A. Padua,et al.  High-Speed Multiprocessors and Compilation Techniques , 1980, IEEE Transactions on Computers.

[28]  Robert A. Wagner,et al.  Globally Optimum Selection of Memory Storage Patterns , 1985, ICPP.

[29]  Milind Girkar,et al.  Parafrase-2: an Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors , 1989, Int. J. High Speed Comput..

[30]  Mark N. Wegman,et al.  A Fast and Usually Linear Algorithm for Global Flow Analysis , 1976, J. ACM.

[31]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[32]  Michael Weiss Strip mining on SIMD architectures , 1991, ICS '91.

[33]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1984, TOPL.

[34]  Constantine D. Polychronopoulos,et al.  Advanced Loop Optimizations for Parallel Computers , 1988, ICS.

[35]  David W. Wall,et al.  Global register allocation at link time , 1986, SIGPLAN '86.

[36]  Charles Howard Koelbel,et al.  Compiling programs for nonshared memory machines , 1991 .

[37]  Chau-Wen Tseng An optimizing Fortran D compiler for MIMD distributed-memory machines , 1993 .

[38]  David Padua,et al.  Machine-Independent Evaluation of Parallelizing Compilers , 1992 .

[39]  S SohiGurindar Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .

[40]  Allan Porterfield,et al.  The Tera computer system , 1990, ICS '90.

[41]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[42]  Gary Sabot,et al.  CMAX: a Fortran translator for the connection machine system , 1993, ICS '93.

[43]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[44]  Ken Kennedy,et al.  Relaxing SIMD control flow constraints using loop transformations , 1992, PLDI '92.

[45]  Joseph A. Fisher,et al.  Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.

[46]  William Baxter,et al.  The program dependence graph and vectorization , 1989, POPL '89.

[47]  Robert Sims,et al.  Alpha architecture reference manual , 1992 .

[48]  Guy L. Steele,et al.  Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines , 1990, J. Parallel Distributed Comput..

[49]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[50]  William Pugh,et al.  Uniform techniques for loop optimization , 1991, ICS '91.

[51]  David A. Padua,et al.  Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs , 1991, LCPC.

[52]  Corporate SPARC architecture manual - version 8 , 1992 .

[53]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[54]  David J. Kuck,et al.  Time and Parallel Processor Bounds for Linear Recurrence Systems , 1975, IEEE Transactions on Computers.

[55]  Paul Hudak,et al.  Distributed execution of functional programs using serial combinators , 1985, IEEE Transactions on Computers.

[56]  Zhiyuan Li Array privatization for parallel execution of loops , 1992, ICS.

[57]  Manish Gupta,et al.  Automatic Data Partitioning on Distributed Memory Multicomputers , 1992 .

[58]  Andrew W. Appel,et al.  Compiling with Continuations , 1991 .

[59]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[60]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[61]  Anne Rogers Compiling for locality of reference , 1990 .

[62]  B. Ramakrishna Rau,et al.  The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.

[63]  Guy L. Steele,et al.  Massively parallel data optimization , 1988, Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation.

[64]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[65]  Domenico Ferrari,et al.  The Improvement of Program Behavior , 1976, Computer.

[66]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS.

[67]  Paul Feautrier,et al.  Direct parallelization of call statements , 1986, SIGPLAN '86.

[68]  Paul Hudak,et al.  ORBIT: an optimizing compiler for scheme , 1986, SIGPLAN '86.

[69]  Alexandru Nicolau,et al.  Measuring the Parallelism Available for Very Long Instruction Word Architectures , 1984, IEEE Transactions on Computers.

[70]  Michael Shebanow,et al.  Single instruction stream parallelism is greater than two , 1991, ISCA '91.

[71]  Paul Hudak,et al.  Compilation of Haskell array comprehensions for scientific computing , 1990, PLDI '90.

[72]  Ken Kennedy,et al.  Analysis and transformation in an interactive parallel programming tool , 1993, Concurr. Pract. Exp..

[73]  Edith Schonberg,et al.  Factoring: a practical and robust method for scheduling parallel loops , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[74]  David A. Padua,et al.  Static and dynamic evaluation of data dependence analysis , 1993, ICS '93.

[75]  Bernard A. Galler,et al.  An Algorithm for Translating Boolean Expressions , 1962, JACM.

[76]  Gregory J. Chaitin,et al.  Register allocation and spilling via graph coloring , 2004, SIGP.

[77]  Simon L. Peyton Jones,et al.  Strictness Analysis - A Practical Approach , 1985, FPCA.

[78]  K. Kennedy,et al.  Preliminary experiences with the Fortran D compiler , 1993, Supercomputing '93.

[79]  Ken Kennedy,et al.  An Interactive Environment for Data Partitioning and Distribution , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[80]  Gyungho Lee,et al.  An empirical study of automatic restructuring of nonnumerical programs for parallel processors , 1985, IEEE Transactions on Computers.

[81]  Donald J. Hatfield,et al.  Program Restructuring for Virtual Memory , 1971, IBM Syst. J..

[82]  John Cocke,et al.  A methodology for the real world , 1981 .

[83]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[84]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[85]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[86]  F. H. Mcmahon,et al.  The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .

[87]  Neil D. Jones,et al.  Program Flow Analysis: Theory and Application , 1981 .

[88]  R.R. Oehler,et al.  IBM RISC System/6000: architecture and performance , 1991, IEEE Micro.

[89]  Barbara G. Ryder,et al.  Interprocedural modification side effect analysis with pointer aliasing , 1993, PLDI '93.

[90]  P. Stenstrom A survey of cache coherence schemes for multiprocessors , 1990, Computer.

[91]  Ii C. D. Callahan A global approach to detection of parallelism , 1987 .

[92]  Scott McFarling,et al.  Procedure merging with instruction caches , 1991, PLDI '91.

[93]  R. A. Towle,et al.  Control and data dependence for program transformations. , 1976 .

[94]  Vivek Sarkar,et al.  Compile-time partitioning and scheduling of parallel programs , 1986, SIGPLAN '86.

[95]  Paul Feautrier,et al.  Array expansion , 1988, ICS '88.

[96]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[97]  Ken Kennedy,et al.  Fast interprocedual alias analysis , 1989, POPL '89.

[98]  John Glauert,et al.  SISAL: streams and iteration in a single assignment language. Language reference manual, Version 1. 2. Revision 1 , 1985 .

[99]  Vasanth Balasundaram A Mechanism for Keeping Useful Internal Information in Parallel Programming Tools: The Data Access Descriptor , 1990, J. Parallel Distributed Comput..

[100]  Skef Wholey Automatic data mapping for distributed-memory parallel computers , 1992, ICS '92.

[101]  Zhiyuan Li,et al.  Data dependence analysis on multi-dimensional array references , 1989, ICS '89.

[102]  Ken Kennedy,et al.  Practical dependence testing , 1991, PLDI '91.

[103]  Edward M. Riseman,et al.  The Inhibition of Potential Parallelism by Conditional Jumps , 1972, IEEE Transactions on Computers.

[104]  Henry Massalin Superoptimizer: a look at the smallest program , 1987, ASPLOS 1987.

[105]  William B. Ackerman,et al.  Data Flow Languages , 1899, Computer.

[106]  Dorothy Wedel Fortran for the Texas Instruments ASC system , 1975 .

[107]  Michael J. Flynn,et al.  Detection and Parallel Execution of Independent Instructions , 1970, IEEE Transactions on Computers.

[108]  Zhiyu Shen,et al.  An Empirical Study of Fortran Programs for Parallelizing Compilers , 1990, IEEE Trans. Parallel Distributed Syst..

[109]  Matthew S. Hecht,et al.  Flow Analysis of Computer Programs , 1977 .

[110]  Robert L. Bernstein Multiplication by integer constants , 1986, Softw. Pract. Exp..

[111]  Chau-Wen Tseng,et al.  The Power Test for Data Dependence , 1992, IEEE Trans. Parallel Distributed Syst..

[112]  Etienne Morel,et al.  Global optimization by suppression of partial redundancies , 1979, CACM.

[113]  Robert E. Millstein,et al.  The ILLIAC IV FORTRAN compiler , 1975, Programming Languages and Compilers for Parallel and Vector Machines.

[114]  J. E. Ball,et al.  Predicting the effects of optimization on a procedure body , 1979, SIGPLAN '79.

[115]  John Darlington,et al.  A Transformation System for Developing Recursive Programs , 1977, J. ACM.

[116]  David J. Kuck,et al.  A Survey of Parallel Machine Organization and Programming , 1977, CSUR.

[117]  Vivek Sarkar,et al.  Partitioning parallel programs for macro-dataflow , 1986, LFP '86.

[118]  John Randal Allen,et al.  Dependence analysis for subscripted variables and its application to program transformations , 1983 .

[119]  Wilson C. Hsieh,et al.  A framework for determining useful parallelism , 1988, ICS '88.

[120]  Keith D. Cooper,et al.  An experiment with inline substitution , 1991, Softw. Pract. Exp..

[121]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[122]  John Glauert,et al.  SISAL: streams and iteration in a single-assignment language. Language reference manual, Version 1. 1 , 1983 .

[123]  Bob Boothe,et al.  Improved multithreading techniques for hiding communication latency in multiprocessors , 1992, ISCA '92.

[124]  J. Cocke Global common subexpression elimination , 1970, Symposium on Compiler Optimization.

[125]  Ken Kennedy,et al.  Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..

[126]  Fred C. Chow Minimizing register usage penalty at procedure calls , 1988, PLDI '88.

[127]  Ken Kennedy,et al.  The parascope editor: an interactive parallel programming tool , 1993, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[128]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[129]  Vivek Sarkar,et al.  Determining average program execution times and their variance , 1989, PLDI '89.

[130]  Steven Lucco,et al.  Orchestrating interactions among parallel computations , 1993, PLDI '93.

[131]  Robert Scheifler,et al.  An analysis of inline substitution for a structured programming language , 1977, CACM.

[132]  Michael Gerndt,et al.  SUPERB: A tool for semi-automatic MIMD/SIMD parallelization , 1988, Parallel Comput..

[133]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[134]  Wilfried Oed Cray Y-MP C90: System features and early benchmark results (Short communication) , 1992, Parallel Comput..

[135]  Guy L. Steele,et al.  Fortran at ten gigaflops: the connection machine convolution compiler , 1991, PLDI '91.

[136]  Marina C. Chen,et al.  Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[137]  David J. Evans,et al.  Inter-Procedural Analysis for Parallel Computing , 1995, Parallel Comput..

[138]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[139]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[140]  Constantine D. Polychronopoulos,et al.  Parallel programming and compilers , 1988 .

[141]  Micha Sharir,et al.  Experience with the SETL Optimizer , 1983, TOPL.

[142]  Zhiyu Shen,et al.  An Empirical Study on Array Subscripts and Data Dependencies , 1989, ICPP.

[143]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[144]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[145]  Ken Kennedy,et al.  A Methodology for Procedure Cloning , 1993, Computer languages.

[146]  Steven Lucco,et al.  A dynamic scheduling method for irregular parallel programs , 1992, PLDI '92.

[147]  Zhiyuan Li,et al.  An Efficient Data Dependence Analysis for Parallelizing Compilers , 1990, IEEE Trans. Parallel Distributed Syst..

[148]  K. Mani Chandy,et al.  A comparison of list schedules for parallel processing systems , 1974, Commun. ACM.

[149]  Kenneth E. Iverson,et al.  A programming language , 1899, AIEE-IRE '62 (Spring).

[150]  Ken Kennedy,et al.  Loop distribution with arbitrary control flow , 1990, Proceedings SUPERCOMPUTING '90.

[151]  Richard Kenner,et al.  Eliminating branches using a superoptimizer and the GNU C compiler , 1992, PLDI '92.

[152]  William D. Clinger,et al.  Revised3 report on the algorithmic language scheme , 1986, SIGP.

[153]  Hassan Aït-Kaci,et al.  Warren's Abstract Machine: A Tutorial Reconstruction , 1991 .

[154]  Michael Gerndt,et al.  Updating Distributed Variables in Local Computations , 1990, Concurr. Pract. Exp..

[155]  Thomas G. Szymanski,et al.  Assembling code for machines with span-dependent instructions , 1978, CACM.

[156]  Alexandru Nicolau,et al.  Loop Quantization: A Generalized Loop Unwinding Technique , 1988, J. Parallel Distributed Comput..

[157]  Jack B. Dennis,et al.  Data Flow Supercomputers , 1980, Computer.

[158]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[159]  Ken Kennedy,et al.  Improving register allocation for subscripted variables , 1990, PLDI '90.

[160]  John H. Reif,et al.  Efficient Symbolic Analysis of Programs , 1986, J. Comput. Syst. Sci..

[161]  Utpal Banerjee,et al.  Speedup of ordinary programs , 1979 .

[162]  Eugene W. Myers,et al.  A precise inter-procedural data flow algorithm , 1981, POPL '81.

[163]  David E. Culler,et al.  Dataflow architectures , 1986 .

[164]  Donald B. Alpert,et al.  Architecture of the Pentium microprocessor , 1993, IEEE Micro.

[165]  Alfred V. Aho,et al.  Code Generation for Expressions with Common Subexpressions , 1977, J. ACM.

[166]  Yoichi Muraoka,et al.  Parallelism exposure and exploitation in programs , 1971 .

[167]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[168]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[169]  James R. Larus,et al.  Loop-Level Parallelism in Numeric and Symbolic Programs , 1993, IEEE Trans. Parallel Distributed Syst..

[170]  Monica S. Lam,et al.  Array-data flow analysis and its use in array privatization , 1993, POPL '93.

[171]  W. W. Hwu,et al.  Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.

[172]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.

[173]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[174]  Alexander Aiken,et al.  Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.

[175]  Robert E. Tarjan,et al.  Applications of Path Compression on Balanced Trees , 1979, JACM.

[176]  Joel H. Saltz,et al.  Principles of runtime support for parallel processors , 1988, ICS '88.

[177]  Utpal Banerjee,et al.  Time and Parallel Processor Bounds for Fortran-Like Loops , 1979, IEEE Transactions on Computers.

[178]  Sidney B. Gasser Program optimization , 1972 .

[179]  G. Tylutki Building a self-modifying user interface , 1989 .

[180]  John L. Hennessy,et al.  The priority-based coloring approach to register allocation , 1990, TOPL.

[181]  John R. Gilbert,et al.  Automatic array alignment in data-parallel programs , 1993, POPL '93.

[182]  Ron Cytron,et al.  Limited Processor Scheduling of Doacross Loops , 1987, ICPP.

[183]  Robert L. Smith,et al.  An American National Standard- IEEE Standard for Binary Floating-Point Arithmetic , 1985 .

[184]  R. H. Katz,et al.  Evaluating the performance of four snooping cache coherency protocols , 1989, ISCA '89.

[185]  Jack J. Dongarra,et al.  Unrolling loops in fortran , 1979, Softw. Pract. Exp..

[186]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[187]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[188]  Walid Abu-Sufah,et al.  Improving the performance of virtual memory computers. , 1979 .

[189]  Robert B. Murray,et al.  Compiling for the CRISP Microprocessor , 1987, COMPCON.

[190]  Alexandru Nicolau,et al.  Parallel processing: a smart compiler and a dumb machine , 1984, SIGP.

[191]  Gary A. Kildall,et al.  A unified approach to global program optimization , 1973, POPL.

[192]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[193]  Mark N. Wegman,et al.  Constant propagation with conditional branches , 1985, POPL.

[194]  Vivek Sarkar,et al.  A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness , 1994, CASCON.

[195]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[196]  Hassan Aït-Kaci Warren's Abstract Machine , 1991, ICLP.

[197]  David L. Presberg The Paralyzer: Ivtran's Parallelism Analyzer and Synthesizer , 1975 .

[198]  Milind Girkar,et al.  Partitioning programs for parallel execution , 1988, ICS '88.

[199]  Jong-Deok Choi,et al.  Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects , 1993, POPL '93.

[200]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[201]  Peiyi Tang,et al.  Dynamic Processor Self-Scheduling for General Parallel Nested Loops , 1987, IEEE Trans. Computers.

[202]  D. H. Bartley,et al.  Revised4 report on the algorithmic language scheme , 1991, LIPO.

[203]  K.M. Dixit New CPU benchmark suites from SPEC , 1992, Digest of Papers COMPCON Spring 1992.

[204]  Duncan H. Lawrie,et al.  On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.

[205]  Gururaj S. Rao,et al.  Design of the IBM System/390 computer family for numerically intensive applications: An overview for engineers and scientists , 1992, IBM J. Res. Dev..

[206]  A. P. Yershóv ALPHA—An Automatic Programming System of High Efficiency , 1966, JACM.

[207]  Dennis Gannon,et al.  Applying AI Techniques to Program Optimization for Parallel Computers , 1987 .

[208]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[209]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[210]  Donald E. Knuth,et al.  An empirical study of FORTRAN programs , 1971, Softw. Pract. Exp..

[211]  Keith D. Cooper,et al.  Unexpected side effects of inline substitution: a case study , 1992, LOPL.

[212]  Monica S. Lam,et al.  Jade: a high-level, machine-independent language for parallel programming , 1993, Computer.

[213]  Ken Kennedy,et al.  Interprocedural transformations for parallel code generation , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[214]  David B. Loveman,et al.  Program Improvement by Source-to-Source Transformation , 1977, J. ACM.

[215]  Erwin Tomash,et al.  SOS: The next 11111 years , 1976, Computer.

[216]  John Cocke,et al.  Register Allocation Via Coloring , 1981, Comput. Lang..

[217]  Sidney B. Gasser Program optimization , 1972, SICOSIM3.