An Overview of the Intel® IA-64 Compiler

The IA-64 architecture is designed with a unique combination of rich features so that it overcomes the limitations of traditional architectures and provides performance scalability for the future. The IA-64 features expose new opportunities for the compiler to optimize applications. We have incorporated into the Intel IA-64 compiler the key technology necessary to exploit these new optimization opportunities and to boost the performance of applications on the IA-64 hardware. In this paper, we provide an overview of the Intel IA-64 compiler, discuss and illustrate several optimization techniques, and explain how these optimizations help harness the power of IA-64 for higher application performance.

[1]  Etienne Morel,et al.  Global optimization by suppression of partial redundancies , 1979, CACM.

[2]  Bernhard Steffen,et al.  Lazy code motion , 1992, PLDI '92.

[3]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[4]  Ken Kennedy,et al.  Scalar replacement in the presence of conditional control flow , 1994, Softw. Pract. Exp..

[5]  Peter Y.-T. Hsu,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS III.

[6]  Ken Kennedy,et al.  Interprocedural side-effect analysis in linear time , 1988, PLDI '88.

[7]  Todd C. Mowry,et al.  Tolerating latency through software-controlled data prefetching , 1994 .

[8]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[9]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[10]  B. Ramakrishna Rau,et al.  Code generation schema for modulo scheduled loops , 1992, MICRO.

[11]  Carole Dulong,et al.  The IA-64 Architecture at Work , 1998, Computer.

[12]  Kishore N. Menezes,et al.  Wavefront scheduling: path based data representation and scheduling of subgraphs , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[13]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[14]  Raymond Lo,et al.  A new algorithm for partial redundancy elimination based on SSA form , 1997, PLDI '97.

[15]  Steven Mark Carr,et al.  Memory-hierarchy management , 1993 .

[16]  Wei-Chung Hsu,et al.  Data Prefetching On The HP PA-8000 , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.