Space-round tradeoffs for MapReduce computations

This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by allowing for a flexible use of parallelism. Indeed, the model diverges from a traditional processor-centric view by featuring parameters which embody only global and local memory constraints, thus favoring a more data-centric view. Second, we apply the model to the fundamental computation task of matrix multiplication presenting upper and lower bounds for both dense and sparse matrix multiplication, which highlight interesting tradeoffs between space and round complexity. Finally, building on the matrix multiplication results, we derive further space-round tradeoffs on matrix inversion and matching.

[1]  Gerald Penn,et al.  Efficient transitive closure of sparse matrices over closed semirings , 2006, Theor. Comput. Sci..

[2]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[3]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[4]  Hartmut Schmeck,et al.  Multiplication of Matrices With Different Sparseness Properties on Dynamically Reconfigurable Meshes , 1999, VLSI Design.

[5]  Raphael Yuster,et al.  Detecting short directed cycles using rectangular matrix multiplication and dynamic programming , 2004, SODA '04.

[6]  Larry Rudolph,et al.  Techniques for Parallel Manipulation of Sparse Matrices , 1989, Theor. Comput. Sci..

[7]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[8]  Rasmus Pagh,et al.  Better Size Estimation for Sparse Matrix Products , 2010, Algorithmica.

[9]  Qin Zhang,et al.  Sorting, Searching, and Simulation in the MapReduce Framework , 2011, ISAAC.

[10]  Ravi Kumar,et al.  Max-cover in map-reduce , 2010, WWW '10.

[11]  Raphael Yuster,et al.  Fast sparse matrix multiplication , 2004, TALG.

[12]  Jimmy J. Lin,et al.  Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.

[13]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[14]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[15]  Vijay V. Vazirani,et al.  Matching is as easy as matrix inversion , 1987, STOC.

[16]  Fred G. Gustavson,et al.  Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.

[17]  John R. Gilbert,et al.  Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.

[18]  G ValiantLeslie A bridging model for parallel computation , 1990 .

[19]  L. R. Kerr The Effect of Algebraic Structure on the Computational Complexity of Matrix Multiplication , 1970 .

[20]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[21]  Jon Feldman,et al.  On distributing symmetric streaming computations , 2008, SODA '08.

[22]  Michael T. Goodrich,et al.  Communication-Efficient Parallel Sorting , 1999, SIAM J. Comput..

[23]  Michael T. Goodrich,et al.  Simulating Parallel Algorithms in the MapReduce Framework with Applications to Parallel Computational Geometry , 2010, ArXiv.

[24]  Rasmus Pagh,et al.  Better Size Estimation for Sparse Matrix Products , 2010, APPROX-RANDOM.

[25]  Alexander Tiskin,et al.  Memory-Efficient Matrix Multiplication in the BSP Model , 1999, Algorithmica.

[26]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[27]  Giovanni Manzini sparse Matrix Computations on the Hypercube and Related Networks , 1994, J. Parallel Distributed Comput..

[28]  Silvio Lattanzi,et al.  Filtering: a method for solving graph problems in MapReduce , 2011, SPAA '11.

[29]  John R. Gilbert,et al.  A Unified Framework for Numerical and Combinatorial Computing , 2008, Computing in Science & Engineering.

[30]  Riko Jacob,et al.  The I/O Complexity of Sparse Matrix Dense Matrix Multiplication , 2010, LATIN.

[31]  Dror Irony,et al.  Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..

[32]  Andrea Pietracaprina,et al.  Models of Computation, Theoretical , 2011, Encyclopedia of Parallel Computing.

[33]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[34]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[35]  Raphael Yuster,et al.  Fast Sparse Matrix Multiplication , 2004, ESA.

[36]  Victor Y. Pan,et al.  Efficient parallel solution of linear systems , 1985, STOC '85.