Squeezing more CPU performance out of a Cray-2 by vector block scheduling

Compile-time scheduling of vector activities on the Cray 2 is studied using a simplified model of the vector instruction stream. An approach based on experience with an array-processor microde scheduling by the authors is shown to be practical. It calls for a pass of loop scheduling followed by a pass of resource allocation. Actual benchmarks of the resulting code are shown, exhibiting speedups as large as 50% over the current CFT77 compiler. The results also give a novel perspective on vector chaining vs. nonchaining processor architectures.<<ETX>>