Compiler optimizations for the PA-8000

Compiler optimizations play a key role in unlocking the performance of the PA-8000 (L. Gwennap, 1994), an innovative dynamically scheduled machine which is the first implementation of the 64 bit PA 2.0 member of the HP PA-RISC architecture family. This wide superscalar, long out of order machine provides significant execution bandwidth and automatically hides latency at runtime; however despite its ample hardware resources, many of the optimizing transformations which proved effective for the PA-8000 served to augment its ability to exploit the available bandwidth and to hide latency. While legacy codes benefit from the PA-8000's sophisticated hardware, recompilation of old binaries can be vital to realizing the full potential of the PA-8000, given the impact of new compilers in achieving peak performance for this machine.

[1]  Kemal Ebcioglu,et al.  VLIW compilation techniques in a superscalar environment , 1994, PLDI '94.

[2]  Anne M. Holler Optimization for a superscalar out-of-order machine , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[3]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[4]  Carl Burch PA-8000: a case study of static and dynamic branch prediction , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[5]  Fred C. Chow Minimizing register usage penalty at procedure calls , 1988, PLDI '88.

[6]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[7]  Mike Johnson,et al.  Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[8]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[9]  Rudolf Eigenmann,et al.  Symbolic range propagation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[10]  Wei-Chung Hsu,et al.  Instruction scheduling for the HP PA-8000 , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[11]  Ken Kennedy,et al.  Scalar replacement in the presence of conditional control flow , 1994, Softw. Pract. Exp..

[12]  Mark Scott Johnson,et al.  Effectiveness of a machine-level, global optimizer , 1986, SIGPLAN '86.

[13]  Stephen Richardson,et al.  Interprocedural analysis vs. procedure integration , 1989, Inf. Process. Lett..

[14]  Scott A. Mahlke,et al.  Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..

[15]  Deborah S. Coutant Retargetable high-level alias analysis , 1986, POPL '86.

[16]  Susan J. Eggers,et al.  Balanced scheduling: instruction scheduling when memory latency is uncertain , 1993, PLDI '93.

[17]  Jack J. Dongarra,et al.  Unrolling loops in fortran , 1979, Softw. Pract. Exp..

[18]  William H. Harrison,et al.  Compiler Analysis of the Value Ranges for Variables , 1977, IEEE Transactions on Software Engineering.

[19]  Doug Hunt,et al.  Advanced performance features of the 64-bit PA-8000 , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[20]  Jack W. Davidson,et al.  Subprogram Inlining: A Study of its Effects on Program Execution Time , 1992, IEEE Trans. Software Eng..

[21]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[22]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[23]  Wei-Chung Hsu,et al.  Data Prefetching On The HP PA-8000 , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[24]  Suneel Jain,et al.  An efficient approach to data flow analysis in a multiple pass global optimizer , 1988, PLDI '88.

[25]  James C. Dehnert,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS 1989.

[26]  Wei Li,et al.  Compiling for NUMA Parallel Machines , 1993 .

[27]  Gerry Kane,et al.  PA-RISC 2.0 Architecture , 1995 .

[28]  Jack W. Davidson,et al.  Memory access coalescing: a technique for eliminating redundant memory accesses , 1994, PLDI '94.