On optimizing distributed non-negative Tucker decomposition

The Tucker decomposition generalizes singular value decomposition (SVD) to high dimensional tensors. It factorizes a given N-dimensional tensor as the product of a small core tensor and a set of N factor matrices. Non-negative Tucker Decomposition (NTD) is a variant that imposes the constraint that the entries of the core and the factor matrices must be non-negative. Generalizing a classical algorithm from the domain of non-negative matrix factorization, Mørup et al. [19] designed a procedure for NTD via the multiplicative weight update paradigm. Based on the above procedure, we present a distributed implementation of NTD for sparse tensors. We develop three algorithms for efficiently executing the procedure. The first is a baseline algorithm that adapts strategies from prior work on the Tucker decomposition. The other two are improved algorithms that are optimized based on properties unique to the NTD procedure. We present an experimental evaluation on a benchmark of large real-life tensors on a system with 32 to 512 MPI ranks. The study shows that the optimized algorithms outperform the baseline by a factor of up to 6x in execution time. The distributed implementation scales well with speedup up to 12x (as against an ideal factor of 16x).

[1]  George Karypis,et al.  Accelerating the Tucker Decomposition with Compressed Sparse Tensors , 2017, Euro-Par.

[2]  George Karypis,et al.  Tensor-matrix products with a compressed sparse tensor , 2015, IA3@SC.

[3]  Andrzej Cichocki,et al.  Efficient Nonnegative Tucker Decompositions: Algorithms and Uniqueness , 2014, IEEE Transactions on Image Processing.

[4]  Yogish Sabharwal,et al.  On Optimizing Distributed Tucker Decomposition for Sparse Tensors , 2018, ICS.

[5]  George Karypis,et al.  A Medium-Grained Algorithm for Sparse Tensor Factorization , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[6]  Bora Uçar,et al.  Scalable sparse tensor decompositions in distributed memory systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Lars Karlsson,et al.  Parallel algorithms for tensor completion in the CP format , 2016, Parallel Comput..

[8]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[9]  Raf Vandebril,et al.  A New Truncation Strategy for the Higher-Order Singular Value Decomposition , 2012, SIAM J. Sci. Comput..

[10]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[11]  Tamara G. Kolda,et al.  Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..

[12]  Bora Uçar,et al.  High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[13]  George Karypis,et al.  Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[14]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[15]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[16]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[17]  Tamara G. Kolda,et al.  Parallel Tensor Compression for Large-Scale Scientific Data , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[18]  Yogish Sabharwal,et al.  On Optimizing Distributed Tucker Decomposition for Dense Tensors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[19]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[20]  Benoît Meister,et al.  Efficient and scalable computations with sparse tensors , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[21]  Tamara G. Kolda,et al.  Scalable Tensor Decompositions for Multi-aspect Data Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[22]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[23]  Ramakrishnan Kannan,et al.  Parallel Nonnegative CP Decomposition of Dense Tensors , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).

[24]  Lars Kai Hansen,et al.  Algorithms for Sparse Nonnegative Tucker Decompositions , 2008, Neural Computation.

[25]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[26]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.