OS and compiler considerations in the design of the IA-64 architecture

Increasing demands for processor performance have outstripped the pace of process and frequency improvements, pushing designers to find ways of increasing the amount of work that can be processed in parallel. Traditional RISC architectures use hardware approaches to obtain more instruction-level parallelism, with the compiler and the operating system (OS) having only indirect visibility into the mechanisms used.The IA-64 architecture [14] was specifically designed to enable systems which create and exploit high levels of instruction-level parallelism by explicitly encoding a program's parallelism in the instruction set [25]. This paper provides a qualitative summary of the IA-64 architecture features that support control and data speculation, and register stacking. The paper focusses on the functional synergy between these architectural elements (rather than their individual performance merits), and emphasizes how they were designed for cooperation between processor hardware, compilers and the OS.

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  Fred C. Chow Minimizing register usage penalty at procedure calls , 1988, PLDI '88.

[3]  Anoop Gupta,et al.  Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.

[4]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[5]  Scott Mahlke,et al.  Sentinel scheduling: a model for compiler-controlled speculative execution , 1993 .

[6]  Predictability of load/store instruction latencies , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[7]  Dirk Grunwald,et al.  Quantifying Behavioral Differences Between C and C++ Programs , 1994 .

[8]  Alain Deutsch,et al.  Interprocedural may-alias analysis for pointers: beyond k-limiting , 1994, PLDI '94.

[9]  Cathy May,et al.  The PowerPC Architecture: A Specification for a New Family of RISC Processors , 1994 .

[10]  Abraham Silberschatz,et al.  Operating System Concepts, 5th Edition , 1994 .

[11]  Andreas Krall,et al.  Delayed Exceptions - Speculative Execution of Trapping Instructions , 1994, CC.

[12]  Scott A. Mahlke,et al.  Dynamic memory disambiguation using the memory conflict buffer , 1994, ASPLOS VI.

[13]  W. Wilson Ho,et al.  Optimizing the Performance of Dynamically-Linked Programs , 1995, USENIX.

[14]  Scott A. Mahlke,et al.  Three Architecutral Models for Compiler-Controlled Speculative Execution , 1995, IEEE Trans. Computers.

[15]  Gerry Kane,et al.  PA-RISC 2.0 Architecture , 1995 .

[16]  David B. Papworth Tuning the Pentium Pro microarchitecture , 1996, IEEE Micro.

[17]  Rahul Razdan,et al.  The Alpha 21264: a 500 MHz out-of-order execution microprocessor , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[18]  Erik R. Altman,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[19]  Alvin R. Lebeck,et al.  Load latency tolerance in dynamically scheduled processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[20]  B. Ramakrishna Rau,et al.  EPIC: Explicititly Parallel Instruction Computing , 2000, Computer.