On Optimizing Distributed Tucker Decomposition for Dense Tensors

The Tucker decomposition expresses a given tensor as the product of a small core tensor and a set of factor matrices. Our objective is to develop an efficient distributed implementation for the case of dense tensors. The implementation is based on the HOOI (Higher Order Orthogonal Iterator) procedure, wherein the tensor-times-matrix product forms the core routine. Prior work have proposed heuristics for reducing the computational load and communication volume incurred by the routine. We study the two metrics in a formal and systematic manner, and design strategies that are optimal under the two fundamental metrics. Our experimental evaluation on a large benchmark of tensors shows that the optimal strategies provide significant reduction in load and volume compared to prior heuristics, and provide up to 7x speed-up in the overall running time.

[1]  Andrzej Cichocki,et al.  Decomposition of Big Tensors With Low Multilinear Rank , 2014, ArXiv.

[2]  Christos Faloutsos,et al.  HaTen2: Billion-scale tensor decompositions , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[3]  Salah Bourennane,et al.  Multidimensional filtering based on a tensor approach , 2005, Signal Process..

[4]  Benoît Meister,et al.  Efficient and scalable computations with sparse tensors , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[5]  Kijung Shin,et al.  Distributed Methods for High-Dimensional and Large-Scale Tensor Factorization , 2014, 2014 IEEE International Conference on Data Mining.

[6]  Jimeng Sun,et al.  An input-adaptive and in-place approach to dense tensor-times-matrix multiply , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  J. H. Choi,et al.  DFacTo: Distributed Factorization of Tensors , 2014, NIPS.

[8]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[9]  Zheng Chen,et al.  Text representation: from vector to tensor , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Bora Uçar,et al.  High-performance parallel algorithms for the Tucker decomposition of higher order sparse tensors , 2015 .

[11]  Lars Karlsson,et al.  Parallel algorithms for tensor completion in the CP format , 2016, Parallel Comput..

[12]  Tamara G. Kolda,et al.  Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..

[13]  Bora Uçar,et al.  High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[14]  George Karypis,et al.  Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[15]  Bora Uçar,et al.  Scalable sparse tensor decompositions in distributed memory systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[17]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[18]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[19]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[20]  Tamara G. Kolda,et al.  Parallel Tensor Compression for Large-Scale Scientific Data , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[21]  George Karypis,et al.  Accelerating the Tucker Decomposition with Compressed Sparse Tensors , 2017, Euro-Par.

[22]  Tamara G. Kolda,et al.  Scalable Tensor Decompositions for Multi-aspect Data Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Raf Vandebril,et al.  A New Truncation Strategy for the Higher-Order Singular Value Decomposition , 2012, SIAM J. Sci. Comput..

[24]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[25]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[26]  John H. Reif,et al.  Implementations of randomized sorting on large parallel machines , 1992, SPAA '92.

[27]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[28]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[29]  Yogish Sabharwal,et al.  On Optimizing Distributed Tucker Decomposition for Sparse Tensors , 2018, ICS.

[30]  George Karypis,et al.  A Medium-Grained Algorithm for Sparse Tensor Factorization , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).