Compiler-directed ILP extraction for clustered VLIW/EPIC machines: predication, speculation and modulo scheduling

Compiler-directed ILP extraction techniques are critical to effectively exploiting the significant processing capacity of contemporaneous VLIW/EPIC machines. In this paper we propose a novel algorithm for ILP extraction targeting clustered EPIC machines that integrates three powerful techniques: predication, speculation and modulo scheduling. In addition, our framework schedules and binds operations, generating actual VLIW code. To the best of our knowledge, there is no other algorithm in the literature on predicated code optimizations that jointly considers speculation and modulo scheduling in the context of clustered EPIC machines. Our experimental results show that by jointly considering different extraction techniques in a resource aware context, the proposed algorithm can take maximum advantage of the resources available on the clustered machine, aggressively improving performance.

[1]  Antonio González,et al.  Modulo scheduling for a fully-distributed clustered VLIW architecture , 2000, MICRO 33.

[2]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[3]  Tughrul Arslan,et al.  Proceedings Design, Automation and Test in Europe Conference and Exhibition , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[4]  Niraj K. Jha,et al.  Incorporating speculative execution into scheduling of control-flow intensive behavioral descriptions , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[5]  Henk Corporaal,et al.  TTAs: Missing the ILP complexity wall , 1999, J. Syst. Archit..

[6]  Scott A. Mahlke,et al.  Integrated predicated and speculative execution in the IMPACT EPIC architecture , 1998, ISCA.

[7]  Kazutoshi Wakabayashi,et al.  Global scheduling independent of control dependencies based on condition vectors , 1992, [1992] Proceedings 29th ACM/IEEE Design Automation Conference.

[8]  J.A.G. Jess,et al.  A reordering technique for efficient code motion , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[9]  K. Ebcioğlu A compilation technique for software pipelining of loops with conditional jumps , 1988, SIGM.

[10]  Corinna G. Lee,et al.  Software pipelining loops with conditional branches , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[11]  Alice C. Parker,et al.  Sehwa: a software package for synthesis of pipelines from behavioral specifications , 1988, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[12]  Taewhan Kim,et al.  A scheduling algorithm for conditional resource sharing , 1991, 1991 IEEE International Conference on Computer-Aided Design Digest of Technical Papers.

[13]  Wen-mei W. Hwu,et al.  Modulo scheduling of loops in control-intensive non-numeric programs , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[14]  Antonio González,et al.  Modulo scheduling for a fully-distributed clustered VLIW architecture , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[15]  Charles E. Leiserson,et al.  Retiming synchronous circuitry , 1988, Algorithmica.

[16]  Monica Sin-Ling Lam,et al.  A Systolic Array Optimizing Compiler , 1989 .

[17]  Nikil D. Dutt,et al.  Conditional speculation and its effects on performance and area for high-level synthesis , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[18]  Henk Corporaal,et al.  Code generation for transport triggered architectures , 1994, Code Generation for Embedded Processors.

[19]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[20]  D. J. Rees,et al.  Resources Restricted Global Scheduling , 1991, Conference on Advanced Research in VLSI.

[21]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[22]  Gustavo de Veciana,et al.  High-quality operation binding for clustered VLIW datapaths , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[23]  Pierre G. Paulin,et al.  Force-Directed Scheduling in Automatic Data Path Synthesis , 1987, 24th ACM/IEEE Design Automation Conference.

[24]  Wayne Wolf,et al.  Architecture and compiler design issues in programmable media processors , 2000 .

[25]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[26]  William J. Dally,et al.  Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[27]  Gustavo de Veciana,et al.  Clustered VLIW architectures with predicated switching , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[28]  E. F. Girczyc,et al.  HAL: A Multi-Paradigm Approach to Automatic Data Path Synthesis , 1986, 23rd ACM/IEEE Design Automation Conference.

[29]  Ajoy K. Bose,et al.  Bridge: a versatile behavioral synthesis system , 1988, DAC '88.