Exploiting Symmetry in Tensors for High Performance

Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we propose Blocked Compact Symmetric Storage wherein we consider the tensor by blocks and store only the unique blocks of a symmetric tensor. We propose an algorithm-by-blocks, already shown of benefit for matrix computations, that exploits this storage format. A detailed analysis shows that, relative to storing and computing with tensors without taking advantage of symmetry, storage requirements are reduced by a factor O(m!) and computational requirements by a factor O(m), where m is the order of the tensor. An implementation demonstrates that the complexity introduced by storing and computing with tensors by blocks is manageable and preliminary results demonstrate that computational time is indeed reduced. The paper concludes with a discussion of how the insights point to opportunities for generalizing recent advances for the domain of linear algebra libraries to the field of multi-linear computation.

[1]  Daniel M. Reeves,et al.  Notes on Equilibria in Symmetric Games , 2004 .

[2]  P. Regalia Monotonically convergent algorithms for symmetric tensor approximation , 2013 .

[3]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[4]  Robert A. van de Geijn,et al.  The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations , 2012, J. Parallel Distributed Comput..

[5]  Charles Van Loan,et al.  Block Tensor Unfoldings , 2011, SIAM J. Matrix Anal. Appl..

[6]  Tamara G. Kolda,et al.  Efficiently Computing Tensor Eigenvalues on a GPU , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[7]  Robert A. van de Geijn,et al.  Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.

[8]  Tamara G. Kolda,et al.  Categories and Subject Descriptors: G.4 [Mathematics of Computing]: Mathematical Software— , 2022 .

[9]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[10]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[11]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[12]  C. Bender,et al.  Integral transformations. A bottleneck in molecular quantum mechanical calculations , 1972 .

[13]  Umpei Nagashima,et al.  Four-index integral transformation exploiting symmetry , 2005, Comput. Phys. Commun..

[14]  Robert A. van de Geijn,et al.  Solving “large” dense matrix problems on multi-core processors , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  Robert A. van de Geijn,et al.  FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.

[16]  James Demmel,et al.  A preliminary analysis of Cyclops Tensor Framework , 2012 .

[17]  David E. Bernholdt,et al.  Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.

[18]  Robert A. van de Geijn,et al.  Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.

[19]  Robert A. van de Geijn,et al.  High-performance implementation of the level-3 BLAS , 2008, TOMS.

[20]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[21]  Robert A. van de Geijn,et al.  Using PLAPACK - parallel linear algebra package , 1997 .

[22]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[23]  Zhang Yunquan,et al.  Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor , 2012, ICPADS.

[24]  Jack Dongarra,et al.  LAPACK Users' guide (third ed.) , 1999 .

[25]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[26]  Robert A. van de Geijn,et al.  Anatomy of high-performance matrix multiplication , 2008, TOMS.

[27]  Robert A. van de Geijn,et al.  Level-3 BLAS on a GPU: Picking the low hanging fruit , 2012 .

[28]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.