论文信息 - Seamless Integration of Parallelism and Memory Hierarchy

Seamless Integration of Parallelism and Memory Hierarchy

We prove an analogue of Brent's lemma for BSP-like parallel machines featuring a hierarchical structure for both the interconnection and the memory. Specifically, for these machines we present a uniform scheme to simulate any computation designed for v processors on a v?-processor configuration with v? ? v and the same overall memory size. For a wide class of computations the simulation exhibits optimal O (v/v?) slowdown. The simulation strategy aims at translating communication locality into temporal locality. As an important special case (v? = 1), our simulation can be employed to obtain efficient hierarchy-conscious sequential algorithms from efficient fine-grained ones.

[1] Andrew Rau-Chaplin,et al. Scalable parallel geometric algorithms for coarse grained multicomputers , 1993, SCG '93.

[2] Geppino Pucci,et al. A Quantitative Measure of Portability with Application to Bandwidth-Latency Models for Parallel Computing , 1999, Euro-Par.

[3] Frank Dehne,et al. Efficient External Memory Algorithms by Simulating Coarse-Grained Parallel Algorithms , 2002, Algorithmica.

[4] Gianfranco Bilardi,et al. A Characterization of Temporal Locality and Its Portability across Memory Hierarchies , 2001, ICALP.

[5] Andrew Rau-Chaplin,et al. Scalable parallel computational geometry for coarse grained multicomputers , 1996, Int. J. Comput. Geom. Appl..

[6] Sanjay Ranka,et al. A practical hierarchical model of parallel computation , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[7] Bowen Alpern,et al. A model for hierarchical memory , 1987, STOC.

[8] Geppino Pucci,et al. Implementing shared memory on clustered machines , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[9] Clyde P. Kruskal,et al. Submachine Locality in the Bulk Synchronous Setting (Extended Abstract) , 1996, Euro-Par, Vol. II.

[10] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[11] F. P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part I, Upper Bounds , 1995, Theory of Computing Systems.

[12] Anil Maheshwari,et al. Reducing I/O complexity by simulating coarse grained parallel algorithms , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[13] Michael Kaufmann,et al. BSP-Like External-Memory Computation , 1997, CIAC.

[14] Franco P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part II, Lower Bounds , 1999, Theory of Computing Systems.

[15] Sanjay Ranka,et al. A Practical Hierarchical Model of Parallel Computation. I. The Model , 1992, J. Parallel Distributed Comput..

[16] Richard P. Brent,et al. The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[17] Bowen Alpern,et al. Modeling parallel computers as memory hierarchies , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.

[18] Geppino Pucci,et al. On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation , 2001, International Conference on Computational Science.