A Characterization of Temporal Locality and Its Portability across Memory Hierarchies

This paper formulates and investigates the question of whether a given algorithm can be coded in a way efficiently portable across machines with different hierarchical memory systems, modeled as a(x)-HRAMs (Hierarchical RAMs), where the time to access a location x is a(x). The width decomposition framework is proposed to provide a machine-independent characterization of temporal locality of a computation by a suitable set of space reuse parameters. Using this framework, it is shown that, when the schedule, i.e. the order by which operations are executed, is fixed, efficient portability is achievable. We propose (a) the decomposition-tree memory manager, which achieves time within a logarithmic factor of optimal on all HRAMs, and (b) the reoccurrence-width memory manager, which achieves time within a constant factor of optimal for the important class of uniform HRAMs. We also show that, when the schedule is considered as a degree of freedom of the implementation, there are computations whose optimal schedule does vary with the access function. In particular, we exhibit some computations for which any schedule is bound to be a polynomial factor slower than optimal on at least one of two sufficiently different machines. On the positive side, we show that relatively few schedules are sufficient to provide a near optimal solution on a wide class of HRAMs.

[1]  Bowen Alpern,et al.  A model for hierarchical memory , 1987, STOC.

[2]  V. Milutinovic,et al.  Enhancing and Exploiting the Locality , 1999, IEEE Trans. Computers.

[3]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[4]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[5]  F. P. Preparata,et al.  Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part I, Upper Bounds , 1995, Theory of Computing Systems.

[6]  Cornelius T. Leondes Algorithms and Architectures , 1997 .

[7]  Gianfranco Bilardi,et al.  An approach towards an analytical characterization of locality and its portability , 2001, 2001 Innovative Architecture for Future Generation High-Performance Processors and Systems.

[8]  Jeffrey Scott Vitter External memory algorithms , 1998, PODS '98.

[9]  Alok Aggarwal,et al.  Hierarchical memory with block transfer , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[10]  Steven A. Przybylski,et al.  Cache and memory hierarchy design: a performance-directed approach , 1990 .

[11]  Gianfranco Bilardi,et al.  Computational power of pipelined memory hierarchies , 2001, SPAA '01.

[12]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  John E. Savage,et al.  Models of computation - exploring the power of computing , 1998 .

[14]  Matteo Frigo,et al.  Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[15]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[16]  Andrea Pietracaprina,et al.  On the Space and Access Complexity of Computation DAGs , 2000, WG.

[17]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[18]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[19]  Alok Aggarwal,et al.  Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[20]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[21]  Steven A. Przybylski,et al.  Cache and memory hierarchy design , 1990 .

[22]  W. E Nagel 1988 International conference on supercomputing , 1988 .