论文信息 - Computational power of pipelined memory hierarchies

Computational power of pipelined memory hierarchies

We define a model of computation, called the Pipelined Hierarchical Random Access Machine with access function a (x), denoted the a(x)-PH-RAM. In this model, a processor interacts with a memory which can accept requests at a constant rate and satisfy each of the requests to the location x within a(x) units of time. We investigate memory management strategies that lead to time efficient implementations of arbitrary computations on a PH-RAM. We begin by developing the so called pipeline d decomposition-treememory management strategy, which can be tuned to the memory access function. Specifically, for a linear or sublinear access function a(x), w e define the concept of latency-hiding depth d<subscrpt>a</subscrpt>(x) and show ho w an y computation of N operations can be implemented on an a(x)-PH-RAM in time T(N) = &Ogr;(Nd<subscrpt>a</subscrpt>(N)). In particular, T(N) = &Ogr;(N log N) if a(x) = &Ogr;(x), T(N) = &Ogr;(N log log N) if a(x) = &Ogr;(xΒ) with 0 < Β < 1, and T(N) = O(N log* N) if a(x) = &Ogr;(log x). We develop lower bound techniques that allow to establish existential lower bounds on PH-RAMs. In particular, we exhibit computations for which T(N) = &OHgr;(Nlog N/ log log N) when a(x) = &OHgr;(x), T(N) = &OHgr;(Nlog logN) when a(x) = &OHgr;(xΒ) with 0 < Β < 1, and T(N) = &OHgr;(N log* N) when a(x) = &OHgr;(log x). The stated lower bounds show that the pipelined decomposition-tree strategy is existentially optimal for the latter case but indicates the potential for a modest, &Ogr;(log log N) improvement for linear access functions. To realize this potential, a superpipelined decomposition-tree memory manager is proposed, which achieves T(N) = &Ogr;(N log N/log log N). The pipelined decomposition-tree strategy can also be tuned to the computation, in order to exploit its temporal locality as characterized by the width parameters [9]. When the latter are suitably bounded, then T(N) = &Ogr;(N) on any PH-RAM with linear or sublinear access function. Finally, we discuss how performance could benefit from parallelism in the data-dependence dag of the computation or from architectural enhancements, such as block-transfer primitives, and formulate various questions that deserve further investigation.

[1] Steven A. Przybylski,et al. Cache and memory hierarchy design: a performance-directed approach , 1990 .

[2] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.

[3] Stephen A. Cook,et al. Time-bounded random access machines , 1972, J. Comput. Syst. Sci..

[4] John E. Savage,et al. Models of computation - exploring the power of computing , 1998 .

[5] V. Milutinovic,et al. Enhancing and Exploiting the Locality , 1999, IEEE Trans. Computers.

[6] Nancy M. Amato,et al. Predicting performance on SMPs. A case study: the SGI Power Challenge , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[7] Gianfranco Bilardi,et al. A Characterization of Temporal Locality and Its Portability across Memory Hierarchies , 2001, ICALP.

[8] Alok Aggarwal,et al. Hierarchical memory with block transfer , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[9] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .

[10] F. P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part I, Upper Bounds , 1995, Theory of Computing Systems.

[11] Gianfranco Bilardi,et al. An approach towards an analytical characterization of locality and its portability , 2001, 2001 Innovative Architecture for Future Generation High-Performance Processors and Systems.

[12] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[13] Franco P. Preparata,et al. Horizons of Parallel Computation , 1992, J. Parallel Distributed Comput..

[14] Andrea Pietracaprina,et al. On the Space and Access Complexity of Computation DAGs , 2000, WG.

[15] Bowen Alpern,et al. A model for hierarchical memory , 1987, STOC.

[16] Jeffrey Scott Vitter,et al. External memory algorithms , 1998, ESA.

[17] John A. Fotheringham,et al. Dynamic storage allocation in the Atlas computer, including an automatic use of a backing store , 1961, Commun. ACM.