A Second-Order Method for Fitting the Canonical Polyadic Decomposition With Non-Least-Squares Cost

The canonical polyadic decomposition (CPD) can be used to extract meaningful components from a tensor. Most existing optimization methods for fitting the CPD use as cost function the least-squares distance between the tensor and its CPD. While the minimum of this cost function coincides with the maximum likelihood estimator for data with additive i.i.d. Gaussian distributed noise, for other noise distributions, better-suited cost functions exist. For such cost functions, first-order, gradient-based optimization methods have been proposed. However, (approximate) second-order methods, which additionally use information from the Hessian of the cost function to achieve faster convergence, are still largely unexplored. In this paper, we generalize the Gauss–Newton nonlinear least-squares algorithm to twice differentiable entry-wise cost functions. The low-rank structure of the problem is exploited to keep the computational cost low. As a special case, $\beta$-divergence cost functions are examined. We show that quadratic convergence can be obtained close to the solution with a reasonable extra cost in memory and computation time, making the proposed method particularly useful when high accuracy of the decomposition is desired.

[1]  Tamara G. Kolda,et al.  Generalized Canonical Polyadic Tensor Decomposition , 2018, SIAM Rev..

[2]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[3]  Alexey Ozerov,et al.  Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues , 2010, CMMR.

[4]  Roland Badeau,et al.  Beta-Divergence as a Subclass of Bregman Divergence , 2011, IEEE Signal Processing Letters.

[5]  Lieven De Lathauwer,et al.  Nonlinear least squares algorithm for canonical polyadic decomposition using low-rank weights , 2017, 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[6]  Raf Vandebril,et al.  Computing the Gradient in Optimization Algorithms for the CP Decomposition in Constant Memory through Tensor Blocking , 2015, SIAM J. Sci. Comput..

[7]  Daniel M. Dunlavy,et al.  A scalable optimization approach for fitting canonical tensor decompositions , 2011 .

[8]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[9]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[10]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[11]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[12]  Andrzej Cichocki,et al.  PARAFAC algorithms for large-scale problems , 2011, Neurocomputing.

[13]  Andrzej Cichocki,et al.  Tensor Decompositions for Signal Processing Applications: From two-way to multiway component analysis , 2014, IEEE Signal Processing Magazine.

[14]  Rik Pintelon,et al.  Estimating the parameters of a Rice distribution: A Bayesian approach , 2009, 2009 IEEE Instrumentation and Measurement Technology Conference.

[15]  Moritz Diehl,et al.  Exploiting convexity in direct Optimal Control: a sequential convex quadratic programming method , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[16]  Tamara G. Kolda,et al.  Newton-based optimization for Kullback–Leibler nonnegative tensor factorizations , 2013, Optim. Methods Softw..

[17]  Roland Badeau,et al.  A tempering approach for Itakura-Saito non-negative matrix factorization. With application to music transcription , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  A. Bovik,et al.  A universal image quality index , 2002, IEEE Signal Processing Letters.

[19]  Nico Vervliet,et al.  Rank-one Tensor Approximation with Beta-divergence Cost Functions , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[20]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[21]  Tamara G. Kolda,et al.  On Tensors, Sparsity, and Nonnegative Factorizations , 2011, SIAM J. Matrix Anal. Appl..

[22]  Arie Yeredor,et al.  Maximum Likelihood Estimation of a Low-Rank Probability Mass Tensor From Partial Observations , 2019, IEEE Signal Processing Letters.

[23]  Nico Vervliet,et al.  Numerical Optimization-Based Algorithms for Data Fusion , 2019, Data Handling in Science and Technology.

[24]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[25]  Robert W. Basedow,et al.  HYDICE system: implementation and performance , 1995, Defense, Security, and Sensing.

[26]  Qi Wang,et al.  3-D nonlocal means filter with noise estimation for hyperspectral imagery denoising , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[27]  Nico Vervliet,et al.  Exploiting Efficient Representations in Large-Scale Tensor Decompositions , 2019, SIAM J. Sci. Comput..

[28]  Lieven De Lathauwer,et al.  On the Uniqueness of the Canonical Polyadic Decomposition of Third-Order Tensors - Part II: Uniqueness of the Overall Decomposition , 2013, SIAM J. Matrix Anal. Appl..

[29]  Caroline Fossati,et al.  Denoising of Hyperspectral Images Using the PARAFAC Model and Statistical Performance Analysis , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[30]  Andrzej Cichocki,et al.  Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[31]  Andrzej Cichocki,et al.  Fast Alternating LS Algorithms for High Order CANDECOMP/PARAFAC Tensor Factorizations , 2013, IEEE Transactions on Signal Processing.

[32]  Lieven De Lathauwer,et al.  Optimization-Based Algorithms for Tensor Decompositions: Canonical Polyadic Decomposition, Decomposition in Rank-(Lr, Lr, 1) Terms, and a New Generalization , 2013, SIAM J. Optim..

[33]  Derry Fitzgerald,et al.  Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation , 2008, Comput. Intell. Neurosci..

[34]  Lieven De Lathauwer,et al.  Canonical Polyadic Decomposition of Third-Order Tensors: Reduction to Generalized Eigenvalue Decomposition , 2013, SIAM J. Matrix Anal. Appl..

[35]  I. J. Myung,et al.  Tutorial on maximum likelihood estimation , 2003 .