Treegion scheduling for wide issue processors

Instruction scheduling is one of the most important phases of compilation for high-performance processors. A compiler typically divides a program into multiple regions of code and then schedules each region. Many past efforts have focused on linear regions such as traces and superblocks. The linearity of these regions can limit speculation, leading to under-utilization of processor resources, especially on wide-issue machines. A type of non-linear region called a treegion is presented in this paper. The formation and scheduling of treegions takes into account multiple execution paths, and the larger scope of treegions allows more speculation, leading to higher utilization and better performance. Multiple scheduling heuristics for treegions are compared against scheduling for several types of linear regions. Empirical results illustrate that instruction scheduling using treegions treegion scheduling-holds promise. Treegion scheduling using the global weight heuristic outperforms the next highest performing region-superblocks by up to 20%.

[1]  Wen-mei W. Hwu,et al.  Speculative hedge: regulating compile-time speculation against profile variations , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[2]  Todd M. Austin,et al.  Zero-cycle loads: microarchitecture support for reducing load latency , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[3]  Vinod Kathail,et al.  Critical path reduction for scalar programs , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[4]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[5]  Roger A. Bringmann Enhancing instruction level parallelism through compiler-controlled speculation , 1995 .

[6]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[7]  原田 秀逸 私の computer 環境 , 1998 .

[8]  Thomas M. Conte,et al.  Accurate and practical profile-driven compilation using the profile buffer , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[9]  Wen-mei W. Hwu,et al.  Trace Selection For Compiling Large C Application Programs To Microcode , 1988, [1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21.

[10]  Edward S. Davidson,et al.  Highly concurrent scalar processing , 1986, ISCA 1986.

[11]  Rajiv Gupta,et al.  Region Scheduling: An Approach for Detecting and Redistributing Parallelism , 1990, IEEE Trans. Software Eng..

[12]  Richard Johnson,et al.  Analysis techniques for predicated code , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[13]  Thomas M. Conte,et al.  Treegion Scheduling for Highly Parallel Processors , 1997, Euro-Par.

[14]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[15]  Ron Cytron,et al.  What's In a Name? -or- The Value of Renaming for Parallelism Detection and Storage Allocation , 1987, ICPP.

[16]  Vinod Kathail,et al.  Critical path reduction for scalar programs , 1995, MICRO 1995.

[17]  B. Ramakrishna Rau,et al.  The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.

[18]  Wen-mei W. Hwu,et al.  Trace selection for compiling large C application programs to microcode , 1988, MICRO 1988.

[19]  Alexandru Nicolau,et al.  Percolation Scheduling: A Parallel Compilation Technique , 1985 .