Data prefetching in multiprocessor vector cache memories

This paper reports the cache performance of a set of vectorized numerical program from the Perfect Club benchmarks. Using a low cost trace driven simularion technique we show how a non-prefetching vector cache can result in unpredictable performance and how rhis unpredictability makes it difficult to find a good block size. We describe two simple prefetch schemes to reduce the influence of long stride vector accesses and misses due IO block invalidations in mulliprocessor vector caches. These two schemes are shown to have better performance than a non-prefetching cache.

[1]  Janak H. Patel,et al.  How to Simulate 100 Billion References Cheaply , 1991 .

[2]  J. H. Patel,et al.  Data prefetching strategies for vector cache memories , 1991, [1991] Proceedings. The Fifth International Parallel Processing Symposium.

[3]  Steven A. Przybylski,et al.  The performance impact of block sizes and fetch strategies , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[4]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[5]  Janak H. Patel,et al.  Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.

[6]  Dennis Gannon,et al.  Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..

[7]  K. So,et al.  Cache performance of vector processors , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[8]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS '88.

[9]  Alan Jay Smith,et al.  Line (Block) Size Choice for CPU Cache Memories , 1987, IEEE Transactions on Computers.

[10]  Pen-Chung Yew,et al.  Multiprocessor cache design considerations , 1987, ISCA '87.

[11]  David H. Bailey,et al.  Vector Computer Memory Bank Contention , 1987, IEEE Transactions on Computers.

[12]  Ronald S. Clark,et al.  Vector System Performance of the IBM 3090 , 1986, IBM Syst. J..

[13]  Allen D. Malony,et al.  Vector Processing on the Alliant FX/8 Multiprocessor , 1986, ICPP.

[14]  Janak H. Patel,et al.  A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.