Graph Expansion and Communication Costs of Algorithms

The communication complexity of algorithms is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen’s fast matrix multiplication algorithm, and obtain the first lower bound on its communication cost. This bound is optimal.

[1]  Michael A. Bender,et al.  Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model , 2007, SPAA '07.

[2]  Avi Wigderson,et al.  Entropy waves, the zig-zag graph product, and new constant-degree expanders and extractors , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[3]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[4]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[5]  J. Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[7]  James Demmel,et al.  Minimizing Communication in Linear Algebra , 2009, ArXiv.

[8]  Dror Irony,et al.  Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..

[9]  James Demmel,et al.  Communication-optimal Parallel and Sequential Cholesky Decomposition , 2009, SIAM J. Sci. Comput..

[10]  Christopher Umans Group-theoretic algorithms for matrix multiplication , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[11]  Keshav Pingali,et al.  Automatic Generation of Block-Recursive Codes , 2000, Euro-Par.

[12]  Noga Alon,et al.  An elementary construction of constant-degree expanders , 2007, SODA '07.

[13]  H. Whitney,et al.  An inequality related to the isoperimetric inequality , 1949 .

[14]  V. Rich Personal communication , 1989, Nature.

[15]  Sivan Toledo Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..

[16]  Guy E. Blelloch,et al.  Provably good multicore cache performance for divide-and-conquer algorithms , 2008, SODA '08.

[17]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[18]  Viktor K. Prasanna,et al.  Optimizing graph algorithms for improved cache performance , 2002, IEEE Transactions on Parallel and Distributed Systems.

[19]  Marc Snir,et al.  GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING , 2004 .

[20]  V. Strassen Gaussian elimination is not optimal , 1969 .

[21]  Vijaya Ramachandran,et al.  Cache-oblivious dynamic programming , 2006, SODA '06.

[22]  Ran Raz,et al.  On the complexity of matrix product , 2002, STOC '02.

[23]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .