Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization

This work considers the problem of computing the canonical polyadic decomposition (CPD) of large tensors. Prior works leverage data sparsity to handle this problem, which is not suitable for handling dense tensors that often arise in applications such as medical imaging, computer vision, and remote sensing. Stochastic optimization is known for its low memory cost and per-iteration complexity when handling dense data. However, existing stochastic CPD algorithms are not flexible to incorporate a variety of constraints/regularization terms that are of interest in signal and data analytics. Convergence properties of many such algorithms are also unclear. In this work, we propose a stochastic optimization framework for large-scale CPD with constraints/regularization terms. The framework works under a doubly randomized fashion, and can be regarded as a judicious combination of randomized block coordinate descent (BCD) and stochastic proximal gradient (SPG). The algorithm enjoys lightweight updates and small memory footprint. This framework entails considerable flexibility—many frequently used regularizers and constraints can be readily handled. The approach is supported by convergence analysis. Numerical results on large-scale dense tensors are presented to showcase the effectiveness of the proposed approach.

[1]  Nikos D. Sidiropoulos,et al.  Blind Separation of Quasi-Stationary Sources: Exploiting Convex Geometry in Covariance Domain , 2015, IEEE Transactions on Signal Processing.

[2]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[3]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[4]  A. Alexander,et al.  Diffusion tensor imaging of the brain , 2007, Neurotherapeutics.

[5]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable Tensor Decompositions , 2012, ECML/PKDD.

[6]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[7]  Lieven De Lathauwer,et al.  Swamp reducing technique for tensor decomposition , 2008, 2008 16th European Signal Processing Conference.

[8]  H. Robbins A Stochastic Approximation Method , 1951 .

[9]  Francesco Orabona,et al.  On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.

[10]  Nikos D. Sidiropoulos,et al.  A Flexible and Efficient Algorithmic Framework for Constrained Matrix and Tensor Factorization , 2015, IEEE Transactions on Signal Processing.

[11]  Jimeng Sun,et al.  An input-adaptive and in-place approach to dense tensor-times-matrix multiply , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[13]  Wotao Yin,et al.  Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization , 2014, SIAM J. Optim..

[14]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[15]  Nikos D. Sidiropoulos,et al.  Parallel Algorithms for Constrained Tensor Factorization via Alternating Direction Method of Multipliers , 2014, IEEE Transactions on Signal Processing.

[16]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Xiao Fu,et al.  Stochastic Optimization for Coupled Tensor Decomposition with Applications in Statistical Learning , 2019, 2019 IEEE Data Science Workshop (DSW).

[19]  Nico Vervliet,et al.  A Randomized Block Sampling Approach to Canonical Polyadic Decomposition of Large-Scale Tensors , 2016, IEEE Journal of Selected Topics in Signal Processing.

[20]  Bhaskar D. Rao,et al.  An affine scaling methodology for best basis selection , 1999, IEEE Trans. Signal Process..

[21]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[22]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[23]  Nikos D. Sidiropoulos,et al.  Memory-efficient parallel computation of tensor and matrix products for big tensor decomposition , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[24]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[25]  Nikos D. Sidiropoulos,et al.  Joint Tensor Factorization and Outlying Slab Suppression With Applications , 2015, IEEE Transactions on Signal Processing.

[26]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[27]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[28]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[29]  Nikos D. Sidiropoulos,et al.  Identifiability results for blind beamforming in incoherent multipath with small delay spread , 2001, IEEE Trans. Signal Process..

[30]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[31]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[32]  Georgios B. Giannakis,et al.  Blind Multiclass Ensemble Classification , 2017, IEEE Transactions on Signal Processing.

[33]  Xiao Fu,et al.  Tensors, Learning, and “Kolmogorov Extension” for Finite-Alphabet Random Vectors , 2017, IEEE Transactions on Signal Processing.

[34]  Arindam Banerjee,et al.  Randomized Block Coordinate Descent for Online and Stochastic Optimization , 2014, ArXiv.

[35]  Tamara G. Kolda,et al.  On Tensors, Sparsity, and Nonnegative Factorizations , 2011, SIAM J. Matrix Anal. Appl..

[36]  P. Comon,et al.  Tensor decompositions, alternating least squares and other tales , 2009 .

[37]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[38]  Nikos D. Sidiropoulos,et al.  Parallel factor analysis in sensor array processing , 2000, IEEE Trans. Signal Process..

[39]  Nikos D. Sidiropoulos,et al.  Principled Neuro-Functional Connectivity Discovery , 2015, SDM.

[40]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[41]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[42]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[43]  N. Sidiropoulos,et al.  Least squares algorithms under unimodality and non‐negativity constraints , 1998 .

[44]  Tamara G. Kolda,et al.  A Practical Randomized CP Tensor Decomposition , 2017, SIAM J. Matrix Anal. Appl..

[45]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[46]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[47]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[48]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[49]  Christos Faloutsos,et al.  FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop , 2014, SDM.

[50]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[51]  Nikos D. Sidiropoulos,et al.  Robust iterative fitting of multilinear models , 2005, IEEE Transactions on Signal Processing.

[52]  Wing-Kin Ma,et al.  Hyperspectral Super-Resolution: A Coupled Tensor Factorization Approach , 2018, IEEE Transactions on Signal Processing.

[53]  Hans De Sterck,et al.  Nesterov acceleration of alternating least squares for canonical tensor decomposition: Momentum step size selection and restart mechanisms , 2018, Numer. Linear Algebra Appl..

[54]  Kejun Huang,et al.  Block-randomized Stochastic Proximal Gradient for Constrained Low-rank Tensor Factorization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[55]  Mingyi Hong,et al.  On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.

[56]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[57]  Nikos D. Sidiropoulos,et al.  Large Scale Tensor Decompositions: Algorithmic Developments and Applications , 2013, IEEE Data Eng. Bull..

[58]  Lieven De Lathauwer,et al.  Blind Identification of Underdetermined Mixtures by Simultaneous Matrix Diagonalization , 2008, IEEE Transactions on Signal Processing.

[59]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[60]  Nikos D. Sidiropoulos,et al.  Learning Mixtures of Smooth Product Distributions: Identifiability and Algorithm , 2019, AISTATS.