论文信息 - Software pipelining

Software pipelining

Utilizing parallelism at the instruction level is an important way to improve performance. Because the time spent in loop execution dominates total execution time, a large body of optimizations focuses on decreasing the time to execute each iteration. Software pipelining is a technique that reforms the loop so that a faster execution rate is realized. Iterations are executed in overlapped fashion to increase parallelism. Let {<italic>ABC</italic>}<supscrpt><italic>n</italic></supscrpt> represent a loop containing operations <italic>A, B, C</italic> that is executed <italic>n</italic> times. Although the operations of a single iteration can be parallelized, more parallelism may be achieved if the entire loop is considered rather than a single iteration. The software pipelining transformation utilizes the fact that a loop {<italic>ABC</italic>}<supscrpt><italic>n</italic></supscrpt> is equivalent to <italic>A</italic>{<italic>BCA</italic>}<supscrpt><italic>n</italic>−1</supscrpt><italic>BC</italic>. Although the operations contained in the loop do not change, the operations are from different iterations of the original loop. Various algorithms for software pipelining exist. A comparison of the alternative methods for software pipelining is presented. The relationships between the methods are explored and possibilities for improvement highlighted.

[1] K. Mani Chandy,et al. Parallel programming in 2001 , 1991, IEEE Software.

[2] Utpal Banerjee,et al. Time and Parallel Processor Bounds for Fortran-Like Loops , 1979, IEEE Transactions on Computers.

[3] Monica Sin-Ling Lam,et al. A Systolic Array Optimizing Compiler , 1989 .

[4] P. Sadayappan,et al. Efficient static scheduling of loops on synchronous multiprocessors , 1989 .

[5] Peter Y.-T. Hsu,et al. Overlapped loop support in the Cydra 5 , 1989, ASPLOS III.

[6] S. Vegdahl,et al. A Dynamic-programming Technique For Compacting Loops , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[7] James C. Tiernan,et al. An efficient search algorithm to find the elementary circuits of a graph , 1970, CACM.

[8] K. Ebcioğlu. A compilation technique for software pipelining of loops with conditional jumps , 1988, SIGM.

[9] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[10] Uwe Schwiegelshohn,et al. Generating Close to Optimum Loop Schedules on Parallel Processors , 1994, Parallel Process. Lett..

[11] B. Ramakrishna Rau,et al. Register allocation for software pipelined loops , 1992, PLDI '92.

[12] Steven R. Vegdahl,et al. Local code generation and compaction in optimizing microcode compilers , 1982 .

[13] Alexandru Nicolau,et al. Realistic scheduling: compaction for pipelined architectures , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[14] Scott Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[15] Alex Aiken,et al. Compaction-Based Parallelization , 1988 .

[16] Guang R. Gao,et al. A timed Petri-net model for fine-grain loop scheduling , 1991, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[17] Thomas R. Gross,et al. Avoidance and suppression of compensation code in a trace scheduling compiler , 1994, TOPL.

[18] S. Beaty. Instruction scheduling using genetic algorithms , 1992 .

[19] Michael S. Schlansker,et al. Register Allocation for Modulo Scheduled Loops: Strategies, Algorithms and Heuristics , 1992 .

[20] John Paul Shen,et al. Architecture synthesis of high-performance application-specific processors , 1991, DAC '90.

[21] Barbara M. Chapman,et al. Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[22] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1987, TOPL.

[23] Grant E. Haab,et al. Enhanced Modulo Scheduling For Loops With Conditional Branches , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[24] Steven R. Vegdahl. A dynamic-programming technique for compacting loops , 1992, MICRO 1992.

[25] Narsingh Deo,et al. On Algorithms for Enumerating All Circuits of a Graph , 1976, SIAM J. Comput..

[26] Mike Schlansker,et al. Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[27] Graham Wood,et al. Global optimization of microprograms through modular control constructs , 1979, MICRO 12.

[28] Hewlett-Packard,et al. Iterative Modulo Scheduling : An Algorithm For Software , 1997 .

[29] Robert E. Tarjan,et al. Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[30] B. Ramakrishna Rau,et al. The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.

[31] Alexander Aiken,et al. Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.

[32] Guang R. Gao,et al. Minimizing register requirements under resource-constrained rate-optimal software pipelining , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[33] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[34] Richard A. Huff,et al. Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[35] Alexander Aiken,et al. A Development Environment for Horizontal Microcode , 1986, IEEE Trans. Software Eng..

[36] Alexander Aiken,et al. Optimal loop parallelization , 1988, PLDI '88.

[37] Mario Tokoro,et al. An approach to microprogram optimization considering resource occupancy and instruction formats , 1977, MICRO 10.

[38] Harry F. Smith. Data Structures: Form and Function , 1995 .

[39] Peter Y.-T. Hsu,et al. Highly concurrent scalar processing , 1986, ISCA '86.

[40] Toshio Nakatani,et al. “Combining” as a compilation technique for VLIW architectures , 1989, MICRO 22.

[41] B. Ramakrishna Rau,et al. Architectural support for the efficient generation of code for horizontal architectures , 1982, ASPLOS I.

[42] Guang R. Gao,et al. A Framework for Resource-Constrained Rate-Optimal Software Pipelining , 1996, IEEE Trans. Parallel Distributed Syst..

[43] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[44] V. H. Allan,et al. Software Pipelining via Stochastic Search Algorithms , 2022 .

[45] Vicki H. Allan,et al. Software pipelining: A Genetic Algorithm Approach , 1994, IFIP PACT.

[46] Vicki H. Allan,et al. Software pipelining: a comparison and improvement , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[47] M. Rajagopalan,et al. Software Pipelining: Petri Net Pacemaker , 1993, Architectures and Compilation Techniques for Fine and Medium Grain Parallelism.

[48] David J. Kuck,et al. HIGH-SPEED MULTIPROCESSORS AND THEIR COMPILERS. , 1979 .

[49] Bogong Su,et al. URPR—An extension of URCR for software pipelining , 1986, MICRO 19.

[50] Jian Wang,et al. GURPR—a method for global software pipelining , 1987, MICRO 20.

[51] Toshio Nakatani,et al. Using a lookahead window in a compaction-based parallelizing compiler , 1991, SIGM.

[52] A. Nicolau,et al. An environment for the development of microcode for pipelined architectures , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[53] Michael Rodeh,et al. Global instruction scheduling for superscalar machines , 1991, PLDI '91.

[54] Toshio Nakatani,et al. A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture , 1990 .

[55] B. Ramakrishna Rau,et al. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[56] Scott A. Mahlke,et al. Reverse If-Conversion , 1993, PLDI '93.