Stochastic Mirror Descent for Low-Rank Tensor Decomposition Under Non-Euclidean Losses

This work considers low-rank canonical polyadic decomposition (CPD) under a class of non-Euclidean loss functions that frequently arise in statistical machine learning and signal processing. These loss functions are often used for certain types of tensor data, e.g., count and binary tensors, where the least squares loss is considered unnatural. Compared to the least squares loss, the non-Euclidean losses are generally more challenging to handle. Non-Euclidean CPD has attracted considerable interests and a number of prior works exist. However, pressing computational and theoretical challenges, such as scalability and convergence issues, still remain. This work offers a unified stochastic algorithmic framework for large-scale CPD decomposition under a variety of non-Euclidean loss functions. Our key contribution lies in a tensor fiber sampling strategy-based flexible stochastic mirror descent framework. Leveraging the sampling scheme and the multilinear algebraic structure of low-rank tensors, the proposed lightweight algorithm ensures global convergence to a stationary point under reasonable conditions. Numerical results show that our framework attains promising non-Euclidean CPD performance. The proposed framework also exhibits substantial computational savings compared to state-of-the-art methods.

[1]  Shahana Ibrahim,et al.  Fiber-Sampled Stochastic Mirror Descent for Tensor Decomposition with β-Divergence , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Nicolas Gillis,et al.  Algorithms for Nonnegative Matrix Factorization with the Kullback–Leibler Divergence , 2020, Journal of Scientific Computing.

[3]  Xiao Fu,et al.  Recovering Joint Probability of Discrete Random Variables From Pairwise Marginals , 2020, IEEE Transactions on Signal Processing.

[4]  Justin Clarke,et al.  Link Prediction Under Imperfect Detection: Collaborative Filtering for Ecological Networks , 2019, IEEE Transactions on Knowledge and Data Engineering.

[5]  Peter Richtárik,et al.  Fastest rates for stochastic mirror descent methods , 2018, Computational Optimization and Applications.

[6]  Nico Vervliet,et al.  A Second-Order Method for Fitting the Canonical Polyadic Decomposition With Non-Least-Squares Cost , 2020, IEEE Transactions on Signal Processing.

[7]  Nicolas Gillis,et al.  Computing Large-Scale Matrix and Tensor Decomposition With Structured Factors: A Unified Nonconvex Optimization Perspective , 2020, IEEE Signal Processing Magazine.

[8]  Tamara G. Kolda,et al.  Stochastic Gradients for Large-Scale Tensor Decomposition , 2019, SIAM J. Math. Data Sci..

[9]  Cheng Gao,et al.  Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization , 2019, IEEE Transactions on Signal Processing.

[10]  Lexin Li,et al.  Learning from Binary Multiway Data: Probabilistic Tensor Decomposition and its Statistical Optimality , 2018, J. Mach. Learn. Res..

[11]  Tamara G. Kolda,et al.  Generalized Canonical Polyadic Tensor Decomposition , 2018, SIAM Rev..

[12]  H. Vincent Poor,et al.  Learning Nonnegative Factors From Tensor Data: Probabilistic Modeling and Inference Algorithm , 2020, IEEE Transactions on Signal Processing.

[13]  Kejun Huang,et al.  Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms , 2019, NeurIPS.

[14]  Arie Yeredor,et al.  Estimation of a Low-Rank Probability-Tensor from Sample Sub-Tensors via Joint Factorization Minimizing the Kullback-Leibler Divergence , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[15]  Arie Yeredor,et al.  Maximum Likelihood Estimation of a Low-Rank Probability Mass Tensor From Partial Observations , 2019, IEEE Signal Processing Letters.

[16]  Lieven De Lathauwer,et al.  Fiber Sampling Approach to Canonical Polyadic Decomposition and Application to Tensor Completion , 2019, SIAM J. Matrix Anal. Appl..

[17]  Xiao Fu,et al.  Detecting Overlapping and Correlated Communities without Pure Nodes: Identifiability and Algorithm , 2019, ICML.

[18]  Nikos D. Sidiropoulos,et al.  Learning Mixtures of Smooth Product Distributions: Identifiability and Algorithm , 2019, AISTATS.

[19]  Mingyi Hong,et al.  On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.

[20]  Haihao Lu “Relative Continuity” for Non-Lipschitz Nonsmooth Convex Optimization Using Stochastic (or Deterministic) Mirror Descent , 2017, INFORMS Journal on Optimization.

[21]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization under high-order growth , 2018, ArXiv.

[22]  Niao He,et al.  On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization , 2018, 1806.04781.

[23]  Xiao Fu,et al.  Tensors, Learning, and “Kolmogorov Extension” for Finite-Alphabet Random Vectors , 2017, IEEE Transactions on Signal Processing.

[24]  Tamara G. Kolda,et al.  A Practical Randomized CP Tensor Decomposition , 2017, SIAM J. Matrix Anal. Appl..

[25]  Yurii Nesterov,et al.  Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[26]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[27]  Nikos D. Sidiropoulos,et al.  Kullback-Leibler principal component for tensors is not NP-hard , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[28]  Marc Teboulle,et al.  A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications , 2017, Math. Oper. Res..

[29]  Prabhu Babu,et al.  Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning , 2017, IEEE Transactions on Signal Processing.

[30]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[31]  Zhi-Quan Luo,et al.  A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data: With applications in machine learning and signal processing , 2015, IEEE Signal Processing Magazine.

[32]  Nikos D. Sidiropoulos,et al.  A Flexible and Efficient Algorithmic Framework for Constrained Matrix and Tensor Factorization , 2015, IEEE Transactions on Signal Processing.

[33]  Nico Vervliet,et al.  A Randomized Block Sampling Approach to Canonical Polyadic Decomposition of Large-Scale Tensors , 2016, IEEE Journal of Selected Topics in Signal Processing.

[34]  Andrzej Cichocki,et al.  Tensor Decompositions for Signal Processing Applications: From two-way to multiway component analysis , 2014, IEEE Signal Processing Magazine.

[35]  Guanghui Lan,et al.  Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization , 2013, SIAM J. Optim..

[36]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[37]  Christos Faloutsos,et al.  FlexiFaCT: Scalable Flexible Factorization of Coupled Tensors on Hadoop , 2014, SDM.

[38]  Andrzej Cichocki,et al.  Fast Alternating LS Algorithms for High Order CANDECOMP/PARAFAC Tensor Factorizations , 2013, IEEE Transactions on Signal Processing.

[39]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[40]  Lieven De Lathauwer,et al.  Optimization-Based Algorithms for Tensor Decompositions: Canonical Polyadic Decomposition, Decomposition in Rank-(Lr, Lr, 1) Terms, and a New Generalization , 2013, SIAM J. Optim..

[41]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[42]  Andrzej Cichocki,et al.  Low Complexity Damped Gauss-Newton Algorithms for CANDECOMP/PARAFAC , 2012, SIAM J. Matrix Anal. Appl..

[43]  Ali Taylan Cemgil,et al.  Link prediction in heterogeneous data via generalized coupled tensor factorization , 2013, Data Mining and Knowledge Discovery.

[44]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[45]  Tamara G. Kolda,et al.  On Tensors, Sparsity, and Nonnegative Factorizations , 2011, SIAM J. Matrix Anal. Appl..

[46]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[47]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[48]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[49]  P. Comon,et al.  Tensor decompositions, alternating least squares and other tales , 2009 .

[50]  Tore Opsahl,et al.  Clustering in weighted networks , 2009, Soc. Networks.

[51]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[52]  Andrzej Cichocki,et al.  Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[53]  Jafar Adibi,et al.  The Enron Email Dataset Database Schema and Brief Statistical Report , 2004 .

[54]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..