论文信息 - An evaluation of Cray X-MP performance on vectorizable Livermore FORTRAN kernels

An evaluation of Cray X-MP performance on vectorizable Livermore FORTRAN kernels

This paper studies the impact of the architecture features of the Cray-1 and the Cray X-MP and related compiler optimizations on machine performance. We develop a methodology for evaluating the effectiveness of the Cray Fortran compilers in coping with the architecture features and limitations of the Cray-1 and the Cray X-MP. As examples, the effects of vector register reservation and vector index misalignment on the performance of Livermore Fortran Kernels (LFKs) are presented. The causes of the performance differences of two Cray Fortran compilers, CFT1.14 and CFT77.13, on the vectorized LFKs are described and some areas for further improvement are suggested.

Edward S. Davidson | J. H. Tang | E. Davidson | J. Tang

[1] Janak H. Patel,et al. Improving the Throughput of a Pipeline by Insertion of Delays , 1976, ISCA.

[2] F. H. Mcmahon,et al. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .

[3] David W. Wall,et al. Global register allocation at link time , 1986, SIGPLAN '86.

[4] B. Ramakrishna Rau,et al. Efficient code generation for horizontal architectures: Compiler techniques and architectural support , 1982, ISCA '82.

[5] M. L. Simmons,et al. A close look at vector performance of register-to-register vector computers and a new model , 1987, SIGMETRICS '87.

[6] Craig J. Mundie,et al. The Architecture of the Alliant FX/8 Computer , 1986, COMPCON.

[7] Jeffrey Mogul,et al. Spritely NFS: Implementation and Performance of Cache-Consistency Protocols , 1989 .

[8] N. P. Jouppi. Architectural and organizational tradeoffs in the design of the MultiTitan CPU , 1989, ISCA '89.

[9] David W. Wall,et al. The Mahler experience: using an intermediate language as the machine description , 1987, ASPLOS 1987.

[10] Paul Michael Farmwald,et al. On the design of high performance digital arithmetic units , 1981 .

[11] Douglas J. Theis. Vector supercomputers , 1974, Computer.

[12] K. So,et al. Cache performance of vector processors , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[13] David A. Padua,et al. Advanced compiler optimizations for supercomputers , 1986, CACM.

[14] W. P. Petersen,et al. Vector Fortran for numerical problems on CRAY-1 , 1983, CACM.

[15] Deborah Estrin,et al. Visa Protocols for Controlling Inter-Organizational Datagram Flow : Extended Description , 1989 .

[16] Yvon Jégou,et al. Data Synchronized Pipeline Architecture: Pipelining in Multiprocessor Environments , 1986, J. Parallel Distributed Comput..

[17] B. Ramakrishna Rau,et al. Efficient code generation for horizontal architectures: Compiler techniques and architectural support , 1982, ISCA '82.

[18] Peter M. Kogge,et al. The Architecture of Pipelined Computers , 1981 .

[19] John Sanguinetti,et al. Squeezing a Cray-class supercomputer into a single-user package , 1988, Digest of Papers. COMPCON Spring 88 Thirty-Third IEEE Computer Society International Conference.

[20] Brian B. Moore,et al. The IBM System/370 Vector Architecture: Design Considerations , 1988, IEEE Trans. Computers.

[21] W. E Nagel. 1988 International conference on supercomputing , 1988 .

[22] William R. Hamburgen,et al. Optimal Finned Heat Sinks , 1986 .

[23] Norman P. Jouppi,et al. Available instruction-level parallelism for superscalar and superpipelined machines , 1989, ASPLOS III.

[24] N. P. Jouppi,et al. A 20-MIPS sustained 32-bit CMOS microprocessor with high ratio of sustained to peak performance , 1989 .

[25] B. Ramakrishna Rau. Cydra 5 directed dataflow architecture , 1988, Digest of Papers. COMPCON Spring 88 Thirty-Third IEEE Computer Society International Conference.

[26] N. P. Jouppi,et al. Integration and packaging plateaus of processor performance , 1989, Proceedings 1989 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[27] Ken Kennedy,et al. Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..

[28] James E. Smith,et al. Instruction Issue Logic in Pipelined Supercomputers , 1984, IEEE Trans. Computers.

[29] Steven J. Wallach. The CONVEX C-1 64-bit Supercomputer , 1986, COMPCON.

[30] Jack J. Dongarra. Performance of various computers using standard linear equations software in a Fortran environment , 1983, CARN.