Seamless Integration of Parallelism and Memory Hierarchy

We prove an analogue of Brent's lemma for BSP-like parallel machines featuring a hierarchical structure for both the interconnection and the memory. Specifically, for these machines we present a uniform scheme to simulate any computation designed for v processors on a v?-processor configuration with v? ? v and the same overall memory size. For a wide class of computations the simulation exhibits optimal O (v/v?) slowdown. The simulation strategy aims at translating communication locality into temporal locality. As an important special case (v? = 1), our simulation can be employed to obtain efficient hierarchy-conscious sequential algorithms from efficient fine-grained ones.

[1]  Andrew Rau-Chaplin,et al.  Scalable parallel geometric algorithms for coarse grained multicomputers , 1993, SCG '93.

[2]  Geppino Pucci,et al.  A Quantitative Measure of Portability with Application to Bandwidth-Latency Models for Parallel Computing , 1999, Euro-Par.

[3]  Frank Dehne,et al.  Efficient External Memory Algorithms by Simulating Coarse-Grained Parallel Algorithms , 2002, Algorithmica.

[4]  Gianfranco Bilardi,et al.  A Characterization of Temporal Locality and Its Portability across Memory Hierarchies , 2001, ICALP.

[5]  Andrew Rau-Chaplin,et al.  Scalable parallel computational geometry for coarse grained multicomputers , 1996, Int. J. Comput. Geom. Appl..

[6]  Sanjay Ranka,et al.  A practical hierarchical model of parallel computation , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[7]  Bowen Alpern,et al.  A model for hierarchical memory , 1987, STOC.

[8]  Geppino Pucci,et al.  Implementing shared memory on clustered machines , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[9]  Clyde P. Kruskal,et al.  Submachine Locality in the Bulk Synchronous Setting (Extended Abstract) , 1996, Euro-Par, Vol. II.

[10]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[11]  F. P. Preparata,et al.  Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part I, Upper Bounds , 1995, Theory of Computing Systems.

[12]  Anil Maheshwari,et al.  Reducing I/O complexity by simulating coarse grained parallel algorithms , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[13]  Michael Kaufmann,et al.  BSP-Like External-Memory Computation , 1997, CIAC.

[14]  Franco P. Preparata,et al.  Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part II, Lower Bounds , 1999, Theory of Computing Systems.

[15]  Sanjay Ranka,et al.  A Practical Hierarchical Model of Parallel Computation. I. The Model , 1992, J. Parallel Distributed Comput..

[16]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[17]  Bowen Alpern,et al.  Modeling parallel computers as memory hierarchies , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.

[18]  Geppino Pucci,et al.  On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation , 2001, International Conference on Computational Science.