Efficient resource management during instruction scheduling for the EPIC architectures

Effective and efficient modelling and management of hardware resources have always been critical toward generating highly efficient code in optimizing compilers. The instruction templates and dispersal rules of the EPIC architecture add new complexity in managing resource constraints to instruction scheduler. We extended a finite state automaton (FSA) approach to efficiently manage all key resource constraints of an EPIC architecture on-the-fly during instruction scheduling. We have fully integrated the FSA-based resource management into the instruction scheduler in the Open Research Compiler for the EPIC architecture. Our integrated approach shows up to 12% speedup on some SPECint2000 benchmarks and 4.5% speedup on average for all SPECint2000 benchmarks on an Itanium machine when compares to an instruction scheduler with decoupled resource management. In the meantime, the instruction scheduling time of our approach is reduced by 4% on average.

[1]  Sebastian Winkel,et al.  Optimal Global Scheduling for Itanium TM Processor Family , 2002 .

[2]  B. Ramakrishna Rau,et al.  Optimization of Machine Descriptions for Efficient Use , 1996, International Journal of Parallel Programming.

[3]  Vasanth Bala,et al.  A limit study of local memory requirements using value reuse profiles , 1995, MICRO 28.

[4]  John Paul Shen,et al.  A limit study of local memory requirements using value reuse profiles , 1995, MICRO 1995.

[5]  Soo-Mook Moon,et al.  An efficient resource-constrained global scheduling technique for superscalar and VLIW processors , 1992, MICRO 25.

[6]  Thomas Müller Employing finite automata for resource scheduling , 1993, MICRO.

[7]  Alexandre E. Eichenberger,et al.  A reduced multipipeline machine description that preserves scheduling constraints , 1996, PLDI '96.

[8]  Sebastian Winkel,et al.  ILP-based Instruction Scheduling for IA-64 , 2001 .

[9]  Lex Augusteijn,et al.  Instruction Scheduling for TriMedia , 1999, J. Instr. Level Parallelism.

[10]  Kishore N. Menezes,et al.  Wavefront scheduling: path based data representation and scheduling of subgraphs , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[11]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[12]  Woody Lichtenstein,et al.  The multiflow trace scheduling compiler , 1993, The Journal of Supercomputing.

[13]  Christopher W. Fraser,et al.  Detecting pipeline structural hazards quickly , 1994, POPL '94.

[14]  B. Ramakrishna Rau,et al.  Instruction-level parallel processing: History, overview, and perspective , 2005, The Journal of Supercomputing.

[15]  Mark Smotherman,et al.  Instruction scheduling for the Motorola 88110 , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[16]  Rajiv Gupta,et al.  Region Scheduling: An Approach for Detecting and Redistributing Parallelism , 1990, IEEE Trans. Software Eng..

[17]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[18]  B. R. Rau,et al.  Code generation schema for modulo scheduled loops , 1992, MICRO 1992.

[19]  Sebastian Winkel,et al.  ILP-based Instruction Scheduling for IA-64 , 2001, OM '01.

[20]  Michael Rodeh,et al.  Global instruction scheduling for superscalar machines , 1991, PLDI '91.

[21]  Zbigniew Chamski,et al.  Flexible Issue Slot Assignment for VLIW Architectures , 1999 .

[22]  Sridhar Ramakrishnan,et al.  Instruction Schedulimg over Regions: A Framework for Scheduling Across Basic Blocks , 1994, CC.