Reducing the Impact of the MemoryWall for I/O Using Cache Injection

Cache injection addresses the continuing disparity between processor and memory speeds by placing data into a processor's cache directly from the I/O bus. This disparity adversely affects the performance of memory bound applications including certain scientific computations, encryption, image processing, and some graphics applications. Cache injection can reduce memory latency and memory pressure for I/O. The performance of cache injection is dependent on several factors including timely usage of data, the amount of data, and the application's data usage patterns. We show that cache injection provides significant advantages over data prefetching by reducing the pressure on the memory controller by up to 96%. Despite its benefits, cache injection may degrade application performance due to early injection of data. To overcome this limitation, we propose injection policies to determine when and where to inject data. These policies are based on OS, compiler, and application information.

[1]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[2]  Sarita V. Adve,et al.  An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[3]  Veljko M. Milutinovic,et al.  Cache injection on bus based multiprocessors , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[4]  Lixin Zhang,et al.  Mambo: a full system simulator for the PowerPC architecture , 2004, PERV.

[5]  Peter Lancaster,et al.  The theory of matrices , 1969 .

[6]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[7]  Jeffrey S. Vetter,et al.  An Empirical Performance Evaluation of Scalable Scientific Applications , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[8]  Edgar A. León,et al.  An infrastructure for the development of kernel network services proof of concept: fast UDP , 2005, SOSP '05.

[9]  Robert D. Falgout,et al.  Semicoarsening Multigrid on Distributed Memory Machines , 1999, SIAM J. Sci. Comput..

[10]  Ram Huggahalli,et al.  Direct cache access for high bandwidth network I/O , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[12]  A. Maccabe,et al.  Reducing Memory Bandwidth for Chip-Multiprocessors using Cache Injection , 2006 .

[13]  Anoop Gupta,et al.  Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..

[14]  Balaram Sinharoy,et al.  POWER5 system microarchitecture , 2005, IBM J. Res. Dev..

[15]  Sally A. McKee,et al.  Increasing Memory Bandwidth for Vector Computations , 1994, Programming Languages and System Architectures.

[16]  Pen-Chung Yew,et al.  Data Prefetching and Data Forwarding in Shared Memory Multiprocessors , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[17]  Steven A. Moyer,et al.  Access Ordering and Effective Memory Bandwidth , 1993 .

[18]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[19]  Michael J. Flynn,et al.  Producer-consumer communication in distributed shared memory multiprocessors , 1999, Proc. IEEE.