论文信息 - Network-Oblivious Algorithms

Network-Oblivious Algorithms

The design of algorithms that can run unchanged yet efficiently on a variety of machines characterized by different degrees of parallelism and communication capabilities is a highly desirable goal. We propose a framework for network-obliviousness based on a model of computation where the only parameter is the problem's input size. Algorithms are then evaluated on a model with two parameters, capturing parallelism and granularity of communication. We show that, for a wide class of network-oblivious algorithms, optimality in the latter model implies optimality in a block-variant of the decomposable BSP model, which effectively describes a wide and significant class of parallel platforms. We illustrate our framework by providing optimal network-oblivious algorithms for a few key problems, and also establish some negative results.

[1] A G WijshoffHarry,et al. A quantitative comparison of parallel computation models , 1998 .

[2] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..

[3] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[4] Geppino Pucci,et al. Area-time tradeoffs for universal VLSI circuits , 2008, Theor. Comput. Sci..

[5] Geppino Pucci,et al. Area-universal circuits with constant slowdown , 1999, Proceedings 20th Anniversary Conference on Advanced Research in VLSI.

[6] Geppino Pucci,et al. Network-Oblivious Algorithms , 2007, IPDPS.

[7] F. Thomson Leighton,et al. ARRAYS AND TREES , 1992 .

[8] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.

[9] Geppino Pucci,et al. A Quantitative Measure of Portability with Application to Bandwidth-Latency Models for Parallel Computing , 1999, Euro-Par.

[10] F. Leighton,et al. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[11] Dan Suciu,et al. Journal of the ACM , 2006 .

[12] Bruce M. Maggs,et al. Communication-efficient parallel algorithms for distributed random-access machines , 1988, Algorithmica.

[13] Yossi Matias,et al. Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.

[14] Ben H. H. Juurlink,et al. A quantitative comparison of parallel computation models , 1996, SPAA '96.

[15] Gerth Stølting Brodal,et al. On the limits of cache-obliviousness , 2003, STOC '03.

[16] Friedhelm Meyer auf der Heide,et al. Truly Efficient Parallel Algorithms: 1-optimal Multisearch for an Extension of the BSP Model , 1998, Theor. Comput. Sci..

[17] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18] Charles E. Leiserson,et al. Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[19] John E. Savage,et al. Models of computation - exploring the power of computing , 1998 .

[20] F. P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part I, Upper Bounds , 1995, Theory of Computing Systems.

[21] Vijaya Ramachandran,et al. Cache-efficient dynamic programming algorithms for multicores , 2008, SPAA '08.

[22] Francesco Silvestri,et al. On the Limits of Cache-Oblivious Matrix Transposition , 2006, TGC.

[23] Geppino Pucci,et al. Decomposable BSP: A Bandwidth-Latency Model for Parallel and Hierarchical Computation , 2007 .

[24] Geppino Pucci,et al. Cache-oblivious simulation of parallel programs , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[25] Alok Aggarwal,et al. Hierarchical memory with block transfer , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[26] Yossi Matias,et al. Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.

[27] Alok Aggarwal,et al. Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[28] Frank Thomson Leighton,et al. Tight Bounds on the Complexity of Parallel Sorting , 1985, IEEE Trans. Computers.

[29] Bowen Alpern,et al. A model for hierarchical memory , 1987, STOC.

[30] Francesco Silvestri,et al. On the limits of cache-oblivious rational permutations , 2008, Theor. Comput. Sci..

[31] Gianfranco Bilardi,et al. A Characterization of Temporal Locality and Its Portability across Memory Hierarchies , 2001, ICALP.

[32] Clyde P. Kruskal,et al. Submachine Locality in the Bulk Synchronous Setting (Extended Abstract) , 1996, Euro-Par, Vol. II.

[33] ToledoSivan,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004 .

[34] Frank Thomson Leighton. Introduction to parallel algorithms and architectures: arrays , 1992 .

[35] Franco P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part II, Lower Bounds , 1999, Theory of Computing Systems.

[36] Joseph JáJá,et al. An Introduction to Parallel Algorithms , 1992 .

[37] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[38] Franco P. Preparata,et al. Horizons of Parallel Computation , 1992, J. Parallel Distributed Comput..

[39] Mithuna Thottethodi,et al. Recursive Array Layouts and Fast Matrix Multiplication , 2002, IEEE Trans. Parallel Distributed Syst..

[40] Leslie G. Valiant. A Bridging Model for Multi-core Computing , 2008, ESA.

[41] Alexander Tiskin,et al. The Bulk-Synchronous Parallel Random Access Machine , 1996, Theor. Comput. Sci..

[42] S. Sitharama Iyengar,et al. Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[43] Michael T. Goodrich,et al. Communication-Efficient Parallel Sorting , 1999, SIAM J. Comput..

[44] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.

[45] Frank Thomson Leighton,et al. Tight Bounds on the Complexity of Parallel Sorting , 1984, IEEE Transactions on Computers.

[46] Volker Strumpen,et al. The Cache Complexity of Multithreaded Cache Oblivious Algorithms , 2009, SPAA '06.

[47] Volker Strumpen,et al. Cache oblivious stencil computations , 2005, ICS '05.

[48] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[49] L. R. Kerr. The Effect of Algebraic Structure on the Computational Complexity of Matrix Multiplication , 1970 .

[50] Ramesh Subramonian,et al. LogP: a practical model of parallel computation , 1996, CACM.