Stochastic Gradients for Large-Scale Tensor Decomposition

Tensor decomposition is a well-known tool for multiway data analysis. This work proposes using stochastic gradients for efficient generalized canonical polyadic (GCP) tensor decomposition of large-scale tensors. GCP tensor decomposition is a recently proposed version of tensor decomposition that allows for a variety of loss functions such as Bernoulli loss for binary data or Huber loss for robust estimation. The stochastic gradient is formed from randomly sampled elements of the tensor and is efficient because it can be computed using the sparse matricized-tensor-times-Khatri-Rao product (MTTKRP) tensor kernel. For dense tensors, we simply use uniform sampling. For sparse tensors, we propose two types of stratified sampling that give precedence to sampling nonzeros. Numerical results demonstrate the advantages of the proposed approach and its scalability to large-scale problems.

[1]  Tamara G. Kolda,et al.  A Scalable Generative Graph Model with Community Structure , 2013, SIAM J. Sci. Comput..

[2]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[5]  Christos Faloutsos,et al.  FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop , 2014, SDM.

[6]  Tamara G. Kolda,et al.  Generalized Canonical Polyadic Tensor Decomposition , 2018, SIAM Rev..

[7]  Siddharth Gopal,et al.  Adaptive Sampling for SGD by Exploiting Side Information , 2016, ICML.

[8]  Tamara G. Kolda,et al.  Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..

[9]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[10]  Nikhil S. Ketkar Stochastic Gradient Descent , 2017 .

[11]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[12]  Yan Liu,et al.  SPALS: Fast Alternating Least Squares via Implicit Leverage Scores Sampling , 2016, NIPS.

[13]  R. Cochran,et al.  Statistically weighted principal component analysis of rapid scanning wavelength kinetics experiments , 1977 .

[14]  Nikos D. Sidiropoulos,et al.  SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[15]  Jimeng Sun,et al.  Model-Driven Sparse CP Decomposition for Higher-Order Tensors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[16]  Nico Vervliet,et al.  Tensorlab 3.0 — Numerical optimization strategies for large-scale constrained and coupled matrix/tensor factorization , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[17]  Tamara G. Kolda,et al.  Software for Sparse Tensor Decomposition on Emerging Computing Architectures , 2018, SIAM J. Sci. Comput..

[18]  Andrzej Cichocki,et al.  Decomposition of Big Tensors With Low Multilinear Rank , 2014, ArXiv.

[19]  Andrzej Cichocki,et al.  Fast Alternating LS Algorithms for High Order CANDECOMP/PARAFAC Tensor Factorizations , 2013, IEEE Transactions on Signal Processing.

[20]  Age K. Smilde,et al.  Analysis of longitudinal metabolomics data , 2004, Bioinform..

[21]  George Karypis,et al.  An Exploration of Optimization Algorithms for High Performance Tensor Completion , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Grey Ballard,et al.  Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Tong Zhang,et al.  Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling , 2014, ArXiv.

[25]  Max Welling,et al.  Positive tensor factorization , 2001, Pattern Recognit. Lett..

[26]  Jeffrey A. Fessler,et al.  Optimally Weighted PCA for High-Dimensional Heteroscedastic Data , 2018, SIAM Journal on Mathematics of Data Science.

[27]  Grey Ballard,et al.  Shared-memory parallelization of MTTKRP for dense tensors , 2018, PPOPP.

[28]  Richard S. Zemel,et al.  Collaborative Filtering and the Missing at Random Assumption , 2007, UAI.

[29]  Bora Uçar,et al.  Scalable sparse tensor decompositions in distributed memory systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[30]  Tamara G. Kolda,et al.  A Practical Randomized CP Tensor Decomposition , 2017, SIAM J. Matrix Anal. Appl..

[31]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[32]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[33]  Evangelos E. Papalexakis,et al.  SamBaTen: Sampling-based Batch Incremental Tensor Decomposition , 2017, SDM.

[34]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[35]  Nico Vervliet,et al.  A Randomized Block Sampling Approach to Canonical Polyadic Decomposition of Large-Scale Tensors , 2016, IEEE Journal of Selected Topics in Signal Processing.

[36]  Tsevi Mazeh,et al.  Correcting systematic effects in a large set of photometric light curves , 2005, astro-ph/0502056.

[37]  Nico Vervliet,et al.  Nonlinear least squares updating of the canonical polyadic decomposition , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[38]  Daniel M. Dunlavy,et al.  A scalable optimization approach for fitting canonical tensor decompositions , 2011 .

[39]  Chih-Jen Lin,et al.  A fast parallel SGD for matrix factorization in shared memory systems , 2013, RecSys.

[40]  Nikolai F. Rulkov,et al.  On the performance of gas sensor arrays in open sampling systems using Inhibitory Support Vector Machines , 2013 .

[41]  A. Ashok Stochastic Gradient Descent for Deep Learning , 2017 .

[42]  H.H. Yue,et al.  Weighted principal component analysis and its applications to improve FDC performance , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[43]  David P. Woodruff,et al.  Sublinear Time Orthogonal Tensor Decomposition , 2016, NIPS.

[44]  Tamara G. Kolda,et al.  Scalable Tensor Factorizations for Incomplete Data , 2010, ArXiv.

[45]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[46]  James Bailey,et al.  Accelerating Online CP Decompositions for Higher Order Tensors , 2016, KDD.

[47]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[48]  Tamara G. Kolda,et al.  Newton-based optimization for Kullback–Leibler nonnegative tensor factorizations , 2013, Optim. Methods Softw..

[49]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[50]  Morteza Mardani,et al.  Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors , 2014, IEEE Transactions on Signal Processing.

[51]  Nikos D. Sidiropoulos,et al.  Adaptive Algorithms to Track the PARAFAC Decomposition of a Third-Order Tensor , 2009, IEEE Transactions on Signal Processing.

[52]  Alexander J. Smola,et al.  Fast and Guaranteed Tensor Decomposition via Sketching , 2015, NIPS.

[53]  Tong Zhang,et al.  Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.