Using time skewing to eliminate idle time due to memory bandwidth and network limitations

Time skewing is a compile-time optimization that can provide arbitrarily high cache hit rates for a class of iterative calculations, given a sufficient number of time steps and sufficient cache memory. Thus, it can eliminate processor idle time caused by inadequate main memory bandwidth. In this article, we give a generalization of time skewing for multiprocessor architectures, and discuss time skewing for multilevel caches. Our generalization for multiprocessors lets us eliminate processor idle time caused by any combination of inadequate main memory bandwidth, limited network bandwidth, and high network latency, given a sufficiently large problem and sufficient cache. As in the uniprocessor case, the cache requirement grows with the machine balance rather than the problem size. Our techniques for using multilevel caches reduce the LI cache requirement, which would otherwise be unacceptably high for some architectures when using arrays of high dimension.

[1]  William Pugh,et al.  Determining schedules based on performance estimation , 1993 .

[2]  J. D. Tardós,et al.  Publish or Perish , 1987 .

[3]  Larry Carter,et al.  Determining the idle time of a tiling , 1997, POPL '97.

[4]  Zhiyuan Li,et al.  New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[5]  Larry Carter,et al.  Selecting tile shape for minimal execution time , 1999, SPAA '99.

[6]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[7]  John D. McCalpin,et al.  Time Skewing: A Value-Based Approach to Optimizing for Memory Locality , 1999 .

[8]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[9]  Ken Kennedy,et al.  Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..

[10]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[11]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[12]  William Pugh,et al.  Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.

[13]  Cheng-Shang Chang Calculus , 2020, Bicycle or Unicycle?.

[14]  W. Kelly,et al.  Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[15]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[16]  William W. Pugh,et al.  Fine-grained analysis of array computations , 1998 .

[17]  William Pugh,et al.  Constraint-based array dependence analysis , 1998, TOPL.