The authors discuss high-performance methods for solving first-order linear recurrences on a vector computer, investigate automatic transformations, and develop the compiling techniques for first-order linear recurrence problems. The results show that the improved vector code generated by the vectorizing compiler run at the rate of 150 MFLOPS (million floating operations per second), in the case of the moderate loop lengths (>1000), and at the rate of over 200 MFLOPS, in the case of long loop lengths (>10000), on the HITAC S-820 supercomputer. Overall performance improvements of 69% in the 14 Lawrence Livermore Loops and 25% in the 24 Lawrence Livermore Loops, as measured by the harmonic mean, are attained.<<ETX>>
[1]
G. Rodrigue.
Parallel Computations
,
1982
.
[2]
Y. Tanaka,et al.
Advanced vectorization techniques for supercomputers
,
1987
.
[3]
Harold S. Stone,et al.
An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations
,
1973,
JACM.
[4]
D. Heller.
Some Aspects of the Cyclic Reduction Algorithm for Block Tridiagonal Linear Systems
,
1976
.
[5]
Ken Kennedy,et al.
Automatic loop interchange
,
2004,
SIGP.
[6]
David A. Padua,et al.
Dependence graphs and compiler optimizations
,
1981,
POPL '81.