Prefetching on the Cray-T3E

In many parallel applications, network latency causes a dramatic loss in processor utilization. This paper examines software controlled access pipelining (SCAP) as a technique for hiding network latency. An analytic model of SCAP describes basic operation techniques and predicts performance. Results are validated with benchmarks on the Cray-T3E. They show vectorized version of SCAP (V-SCAP) to be at least as fast as the highly optimized shared memory system functions. SCAP on the Cray-T3E improves performance compared to a blocking execution between 35% and 900%, while V-SCAP performs better with a factor of 2.1 to 62. SCAP achieves a performance speed-up against HPF between 48% to a factor of 9.2 dependent on the data access pattern. It also performs well on irregular access patterns which are not, supported by the standard library.

[1]  Thomas M. Warschko Effiziente Kommunikation in Parallelrechnerarchitekturen , 1998 .

[2]  Thomas M. Warschko,et al.  Latency hiding in parallel systems: a quantitative approach , 1994 .

[3]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[4]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[5]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[6]  Charles Koelbel,et al.  Supporting shared data structures on distributed memory architectures , 1990, PPOPP '90.

[7]  Alexander V. Veidenbaum,et al.  Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990, ICS '90.

[8]  Alexander V. Veidenbaum,et al.  Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990 .

[9]  Alan L. Cox,et al.  Compiler and software distributed shared memory support for irregular applications , 1997, PPOPP '97.

[10]  Michael Metcalf,et al.  Fortran 90 Explained , 1990 .

[11]  Steven L. Scott,et al.  Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.

[12]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[13]  John Feo,et al.  An analysis of the computational and parallel complexity of the Livermore Loops , 1988, Parallel Comput..

[14]  Larry Meadows,et al.  PGHPF - An Optimizing High Performance Fortran Compiler for Distributed Memory Machines , 1997, Sci. Program..

[15]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[16]  Jean-Loup Baer,et al.  Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.

[17]  Anne Rogers,et al.  Software support for speculative loads , 1992, ASPLOS V.

[18]  Corporate Rice University,et al.  High performance Fortran language specification , 1993, FORF.