Software pipelining

Utilizing parallelism at the instruction level is an important way to improve performance. Because the time spent in loop execution dominates total execution time, a large body of optimizations focuses on decreasing the time to execute each iteration. Software pipelining is a technique that reforms the loop so that a faster execution rate is realized. Iterations are executed in overlapped fashion to increase parallelism. Let {<italic>ABC</italic>}<supscrpt><italic>n</italic></supscrpt> represent a loop containing operations <italic>A, B, C</italic> that is executed <italic>n</italic> times. Although the operations of a single iteration can be parallelized, more parallelism may be achieved if the entire loop is considered rather than a single iteration. The software pipelining transformation utilizes the fact that a loop {<italic>ABC</italic>}<supscrpt><italic>n</italic></supscrpt> is equivalent to <italic>A</italic>{<italic>BCA</italic>}<supscrpt><italic>n</italic>−1</supscrpt><italic>BC</italic>. Although the operations contained in the loop do not change, the operations are from different iterations of the original loop. Various algorithms for software pipelining exist. A comparison of the alternative methods for software pipelining is presented. The relationships between the methods are explored and possibilities for improvement highlighted.

[1]  K. Mani Chandy,et al.  Parallel programming in 2001 , 1991, IEEE Software.

[2]  Utpal Banerjee,et al.  Time and Parallel Processor Bounds for Fortran-Like Loops , 1979, IEEE Transactions on Computers.

[3]  Monica Sin-Ling Lam,et al.  A Systolic Array Optimizing Compiler , 1989 .

[4]  P. Sadayappan,et al.  Efficient static scheduling of loops on synchronous multiprocessors , 1989 .

[5]  Peter Y.-T. Hsu,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS III.

[6]  S. Vegdahl,et al.  A Dynamic-programming Technique For Compacting Loops , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[7]  James C. Tiernan,et al.  An efficient search algorithm to find the elementary circuits of a graph , 1970, CACM.

[8]  K. Ebcioğlu A compilation technique for software pipelining of loops with conditional jumps , 1988, SIGM.

[9]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[10]  Uwe Schwiegelshohn,et al.  Generating Close to Optimum Loop Schedules on Parallel Processors , 1994, Parallel Process. Lett..

[11]  B. Ramakrishna Rau,et al.  Register allocation for software pipelined loops , 1992, PLDI '92.

[12]  Steven R. Vegdahl,et al.  Local code generation and compaction in optimizing microcode compilers , 1982 .

[13]  Alexandru Nicolau,et al.  Realistic scheduling: compaction for pipelined architectures , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[14]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[15]  Alex Aiken,et al.  Compaction-Based Parallelization , 1988 .

[16]  Guang R. Gao,et al.  A timed Petri-net model for fine-grain loop scheduling , 1991, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[17]  Thomas R. Gross,et al.  Avoidance and suppression of compensation code in a trace scheduling compiler , 1994, TOPL.

[18]  S. Beaty Instruction scheduling using genetic algorithms , 1992 .

[19]  Michael S. Schlansker,et al.  Register Allocation for Modulo Scheduled Loops: Strategies, Algorithms and Heuristics , 1992 .

[20]  John Paul Shen,et al.  Architecture synthesis of high-performance application-specific processors , 1991, DAC '90.

[21]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[22]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[23]  Grant E. Haab,et al.  Enhanced Modulo Scheduling For Loops With Conditional Branches , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[24]  Steven R. Vegdahl A dynamic-programming technique for compacting loops , 1992, MICRO 1992.

[25]  Narsingh Deo,et al.  On Algorithms for Enumerating All Circuits of a Graph , 1976, SIAM J. Comput..

[26]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[27]  Graham Wood,et al.  Global optimization of microprograms through modular control constructs , 1979, MICRO 12.

[28]  Hewlett-Packard,et al.  Iterative Modulo Scheduling : An Algorithm For Software , 1997 .

[29]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[30]  B. Ramakrishna Rau,et al.  The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.

[31]  Alexander Aiken,et al.  Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.

[32]  Guang R. Gao,et al.  Minimizing register requirements under resource-constrained rate-optimal software pipelining , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[33]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[34]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[35]  Alexander Aiken,et al.  A Development Environment for Horizontal Microcode , 1986, IEEE Trans. Software Eng..

[36]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[37]  Mario Tokoro,et al.  An approach to microprogram optimization considering resource occupancy and instruction formats , 1977, MICRO 10.

[38]  Harry F. Smith Data Structures: Form and Function , 1995 .

[39]  Peter Y.-T. Hsu,et al.  Highly concurrent scalar processing , 1986, ISCA '86.

[40]  Toshio Nakatani,et al.  “Combining” as a compilation technique for VLIW architectures , 1989, MICRO 22.

[41]  B. Ramakrishna Rau,et al.  Architectural support for the efficient generation of code for horizontal architectures , 1982, ASPLOS I.

[42]  Guang R. Gao,et al.  A Framework for Resource-Constrained Rate-Optimal Software Pipelining , 1996, IEEE Trans. Parallel Distributed Syst..

[43]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[44]  V. H. Allan,et al.  Software Pipelining via Stochastic Search Algorithms , 2022 .

[45]  Vicki H. Allan,et al.  Software pipelining: A Genetic Algorithm Approach , 1994, IFIP PACT.

[46]  Vicki H. Allan,et al.  Software pipelining: a comparison and improvement , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[47]  M. Rajagopalan,et al.  Software Pipelining: Petri Net Pacemaker , 1993, Architectures and Compilation Techniques for Fine and Medium Grain Parallelism.

[48]  David J. Kuck,et al.  HIGH-SPEED MULTIPROCESSORS AND THEIR COMPILERS. , 1979 .

[49]  Bogong Su,et al.  URPR—An extension of URCR for software pipelining , 1986, MICRO 19.

[50]  Jian Wang,et al.  GURPR—a method for global software pipelining , 1987, MICRO 20.

[51]  Toshio Nakatani,et al.  Using a lookahead window in a compaction-based parallelizing compiler , 1991, SIGM.

[52]  A. Nicolau,et al.  An environment for the development of microcode for pipelined architectures , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[53]  Michael Rodeh,et al.  Global instruction scheduling for superscalar machines , 1991, PLDI '91.

[54]  Toshio Nakatani,et al.  A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture , 1990 .

[55]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[56]  Scott A. Mahlke,et al.  Reverse If-Conversion , 1993, PLDI '93.