论文信息 - Compiler optimizations for the PA-8000

Compiler optimizations for the PA-8000

Compiler optimizations play a key role in unlocking the performance of the PA-8000 (L. Gwennap, 1994), an innovative dynamically scheduled machine which is the first implementation of the 64 bit PA 2.0 member of the HP PA-RISC architecture family. This wide superscalar, long out of order machine provides significant execution bandwidth and automatically hides latency at runtime; however despite its ample hardware resources, many of the optimizing transformations which proved effective for the PA-8000 served to augment its ability to exploit the available bandwidth and to hide latency. While legacy codes benefit from the PA-8000's sophisticated hardware, recompilation of old binaries can be vital to realizing the full potential of the PA-8000, given the impact of new compilers in achieving peak performance for this machine.

Anne M. Holler

[1] Kemal Ebcioglu,et al. VLIW compilation techniques in a superscalar environment , 1994, PLDI '94.

[2] Anne M. Holler. Optimization for a superscalar out-of-order machine , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[3] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[4] Carl Burch. PA-8000: a case study of static and dynamic branch prediction , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[5] Fred C. Chow. Minimizing register usage penalty at procedure calls , 1988, PLDI '88.

[6] Ken Kennedy,et al. Conversion of control dependence to data dependence , 1983, POPL '83.

[7] Mike Johnson,et al. Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[8] David L. Kuck,et al. The Structure of Computers and Computations , 1978 .

[9] Rudolf Eigenmann,et al. Symbolic range propagation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[10] Wei-Chung Hsu,et al. Instruction scheduling for the HP PA-8000 , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[11] Ken Kennedy,et al. Scalar replacement in the presence of conditional control flow , 1994, Softw. Pract. Exp..

[12] Mark Scott Johnson,et al. Effectiveness of a machine-level, global optimizer , 1986, SIGPLAN '86.

[13] Stephen Richardson,et al. Interprocedural analysis vs. procedure integration , 1989, Inf. Process. Lett..

[14] Scott A. Mahlke,et al. Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..

[15] Deborah S. Coutant. Retargetable high-level alias analysis , 1986, POPL '86.

[16] Susan J. Eggers,et al. Balanced scheduling: instruction scheduling when memory latency is uncertain , 1993, PLDI '93.

[17] Jack J. Dongarra,et al. Unrolling loops in fortran , 1979, Softw. Pract. Exp..

[18] William H. Harrison,et al. Compiler Analysis of the Value Ranges for Variables , 1977, IEEE Transactions on Software Engineering.

[19] Doug Hunt,et al. Advanced performance features of the 64-bit PA-8000 , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[20] Jack W. Davidson,et al. Subprogram Inlining: A Study of its Effects on Program Execution Time , 1992, IEEE Trans. Software Eng..

[21] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[22] David F. Bacon,et al. Compiler transformations for high-performance computing , 1994, CSUR.

[23] Wei-Chung Hsu,et al. Data Prefetching On The HP PA-8000 , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[24] Suneel Jain,et al. An efficient approach to data flow analysis in a multiple pass global optimizer , 1988, PLDI '88.

[25] James C. Dehnert,et al. Overlapped loop support in the Cydra 5 , 1989, ASPLOS 1989.

[26] Wei Li,et al. Compiling for NUMA Parallel Machines , 1993 .

[27] Gerry Kane,et al. PA-RISC 2.0 Architecture , 1995 .

[28] Jack W. Davidson,et al. Memory access coalescing: a technique for eliminating redundant memory accesses , 1994, PLDI '94.