Communication Lower Bounds for Tensor Contraction Algorithms

Contractions of nonsymmetric tensors are reducible to matrix multiplication, however, ‘fully symmetric contractions’ in which the tensors are symmetric and the result is symmetrized can be done with fewer operations. The ‘direct evaluation algorithm’ for fully symmetric contractions exploits equivalence between terms in the contraction equation to obtain a lower computation cost than the cost associated with nonsymmetric contractions. The ‘symmetry preserving algorithm’ lowers the cost even further via an algebraic reorganization of the contraction equation. We derive vertical (between memory and cache) and horizontal (interprocessor) communication lower bounds for both of these algorithms. We demonstrate that any load balanced parallel schedule of the direct evaluation algorithm requires asymptotically more horizontal communication for some fully symmetric contractions than matrix multiplication for nonsymmetric contractions of the same size. Instances of such fully symmetric contractions arise in quantum chemistry calculations. Further, we prove that any schedule of the symmetry preserving algorithm requires asymptotically more vertical and horizontal communication than the direct evaluation algorithm for some fully symmetric contractions. However, for the instances of fully symmetric contractions that arise in quantum chemistry calculations, our lower bounds are asymptotically the same for both of these algorithms.

[1]  H. Whitney,et al.  An inequality related to the isoperimetric inequality , 1949 .

[2]  James Demmel,et al.  Contracting Symmetric Tensors Using Fewer Multiplications , 2015 .

[3]  Sriram Krishnamoorthy,et al.  A Communication-Optimal Framework for Contracting Distributed Tensors , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  James Demmel,et al.  Minimizing Communication in Linear Algebra , 2009, ArXiv.

[5]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[6]  James Demmel,et al.  Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1 , 2013, ArXiv.

[7]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[8]  Paul v. Szász Über einen Mittelwertsatz , 1926 .

[9]  Edgar Solomonik Provably Efficient Algorithms for Numerical Tensor Algebra , 2014 .

[10]  V. Strassen Gaussian elimination is not optimal , 1969 .

[11]  John F. Stanton,et al.  A massively parallel tensor contraction framework for coupled-cluster computations , 2014, J. Parallel Distributed Comput..

[12]  James Demmel,et al.  Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[13]  T. Tao,et al.  The Brascamp–Lieb Inequalities: Finiteness, Structure and Extremals , 2005, math/0505065.

[14]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[15]  J. Cizek,et al.  Perspective on "On the correlation problern in atomic and molecular systems. Calculation of wavefunction components in UrseH-type expansion using quantum-field theoretical methods" , 2000 .

[16]  Alexander Tiskin,et al.  The design and analysis of bulk-synchronous parallel algorithms , 1998 .

[17]  Dror Irony,et al.  Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..

[18]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[19]  R. Bartlett Many-Body Perturbation Theory and Coupled Cluster Theory for Electron Correlation in Molecules , 1981 .

[20]  Michael A. Bender,et al.  Optimal sparse matrix dense vector multiplication in the I/O-model , 2007, SPAA.