The dense multiple-vector tensor-vector product: An initial study

The product of a dense tensor with a vector in every mode except one, called a tensor-vector product, is a key operation in several algorithms for computing the canonical tensor decomposition. In these applications, it is even more common to compute a tensor-vector product with the same tensor and r concurrently available sets of vectors, an operation we refer to as a multiple-vector tensor-vector product (MTVP). Current techniques for implementing these operations rely on explicitly reordering the elements of the tensor in order to leverage available matrix libraries. This approach has two significant disadvantages: reordering the data can be expensive if only a small number of concurrent sets of vectors is available in the MTVP, and this requires excessive amounts of additional memory. In this work, we consider two techniques resolving these issues. Successive contractions are proposed to eliminate explicit data reordering, while blocking tackles the excessive memory consumption. The numerical experiments on a wide variety of tensor shapes indicate the effectiveness of these optimizations, clearly illustrating that the additional memory consumption can be limited to tolerable amounts, generally without sacrificing expeditious execution. For several fourth-order tensors, the additional memory requirements were three orders of magnitude smaller than competing implementations, while throughputs of upward of 75% of the peak performance of the computer system can be attained for large values of r.

[1]  P. Paatero A weighted non-negative least squares algorithm for three-way ‘PARAFAC’ factor analysis , 1997 .

[2]  Vin de Silva,et al.  Tensor rank and the ill-posedness of the best low-rank approximation problem , 2006, math/0607647.

[3]  Endong Wang,et al.  Intel Math Kernel Library , 2014 .

[4]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[5]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[6]  R. Bro,et al.  PARAFAC and missing values , 2005 .

[7]  F. L. Hitchcock Multiple Invariants and Generalized Rank of a P‐Way Matrix or Tensor , 1928 .

[8]  Ivan V. Oseledets,et al.  Wedderburn Rank Reduction and Krylov Subspace Method for Tensor Approximation. Part 1: Tucker Case , 2010, SIAM J. Sci. Comput..

[9]  Andrzej Cichocki,et al.  Fast Alternating LS Algorithms for High Order CANDECOMP/PARAFAC Tensor Factorizations , 2013, IEEE Transactions on Signal Processing.

[11]  Lieven De Lathauwer,et al.  Decompositions of a Higher-Order Tensor in Block Terms - Part II: Definitions and Uniqueness , 2008, SIAM J. Matrix Anal. Appl..

[12]  Charles Van Loan,et al.  Block Tensor Unfoldings , 2011, SIAM J. Matrix Anal. Appl..

[13]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[14]  Morten Mørup,et al.  Applications of tensor (multiway array) factorizations and decompositions in data mining , 2011, WIREs Data Mining Knowl. Discov..

[15]  Phillip A. Regalia,et al.  On the Best Rank-1 Approximation of Higher-Order Supersymmetric Tensors , 2001, SIAM J. Matrix Anal. Appl..

[16]  Lieven De Lathauwer,et al.  Optimization-Based Algorithms for Tensor Decompositions: Canonical Polyadic Decomposition, Decomposition in Rank-(Lr, Lr, 1) Terms, and a New Generalization , 2013, SIAM J. Optim..

[17]  Tamara G. Kolda,et al.  Categories and Subject Descriptors: G.4 [Mathematics of Computing]: Mathematical Software— , 2022 .

[18]  Berkant Savas,et al.  A Newton-Grassmann Method for Computing the Best Multilinear Rank-(r1, r2, r3) Approximation of a Tensor , 2009, SIAM J. Matrix Anal. Appl..

[19]  F. Chinesta,et al.  On the Convergence of a Greedy Rank-One Update Algorithm for a Class of Linear Systems , 2010 .

[20]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[21]  Raf Vandebril,et al.  A New Truncation Strategy for the Higher-Order Singular Value Decomposition , 2012, SIAM J. Sci. Comput..

[22]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[23]  I. V. Oseledets,et al.  Minimization methods for approximating tensors and their comparison , 2006 .

[24]  Fred G. Gustavson,et al.  Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..

[25]  Robert A. van de Geijn,et al.  Anatomy of high-performance matrix multiplication , 2008, TOMS.

[26]  Robert A. van de Geijn,et al.  Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.

[27]  J. Leeuw,et al.  Principal component analysis of three-mode data by means of alternating least squares algorithms , 1980 .

[28]  Wenyu Sun,et al.  A modified Newton's method for best rank-one approximation to tensors , 2010, Appl. Math. Comput..

[29]  Rasmus Bro,et al.  Improving the speed of multiway algorithms: Part II: Compression , 1998 .

[30]  Andrzej Cichocki,et al.  Low Complexity Damped Gauss-Newton Algorithms for CANDECOMP/PARAFAC , 2012, SIAM J. Matrix Anal. Appl..

[31]  Shuzhong Zhang,et al.  Maximum Block Improvement and Polynomial Optimization , 2012, SIAM J. Optim..

[32]  Liqun Qi,et al.  On the successive supersymmetric rank‐1 decomposition of higher‐order supersymmetric tensors , 2007, Numer. Linear Algebra Appl..

[33]  R. C. Whaley,et al.  Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005, Softw. Pract. Exp..

[34]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[35]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[36]  Chikio Hayashi,et al.  A NEW ALGORITHM TO SOLVE PARAFAC-MODEL , 1982 .

[37]  Ulrich Rüde,et al.  Expression Templates Revisited: A Performance Analysis of Current Methodologies , 2011, SIAM J. Sci. Comput..

[38]  Berkant Savas,et al.  Krylov-Type Methods for Tensor Computations , 2010, 1005.0683.

[39]  Gene H. Golub,et al.  Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[40]  Robert A. van de Geijn,et al.  FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.

[41]  Tamara G. Kolda,et al.  Shifted Power Method for Computing Tensor Eigenpairs , 2010, SIAM J. Matrix Anal. Appl..

[42]  Tamara G. Kolda,et al.  Scalable Tensor Factorizations for Incomplete Data , 2010, ArXiv.

[43]  Antonio Falcó,et al.  Proper generalized decomposition for nonlinear convex problems in tensor Banach spaces , 2011, Numerische Mathematik.

[44]  Antoine Petitet,et al.  Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005 .

[45]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[46]  Robert A. van de Geijn,et al.  A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures , 2012, TOMS.

[47]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[48]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[49]  A. Velazquez,et al.  Compression of aerodynamic databases using high-order singular value decomposition , 2010 .

[50]  Rasmus Bro,et al.  The N-way Toolbox for MATLAB , 2000 .

[51]  Tze Meng Low,et al.  Exploiting Symmetry in Tensors for High Performance , 2013, ArXiv.