Secrets of Matrix Factorization: Approximations, Numerics, Manifold Optimization and Random Restarts

Matrix factorization (or low-rank matrix completion) with missing data is a key computation in many computer vision and machine learning tasks, and is also related to a broader class of nonlinear optimization problems such as bundle adjustment. The problem has received much attention recently, with renewed interest in variable-projection approaches, yielding dramatic improvements in reliability and speed. However, on a wide class of problems, no one approach dominates, and because the various approaches have been derived in a multitude of different ways, it has been difficult to unify them. This paper provides a unified derivation of a number of recent approaches, so that similarities and differences are easily observed. We also present a simple meta-algorithm which wraps any existing algorithm, yielding 100% success rate on many standard datasets. Given 100% success, the focus of evaluation must turn to speed, as 100% success is trivially achieved if we do not care about speed. Again our unification allows a number of generic improvements applicable to all members of the family to be isolated, yielding a unified algorithm that outperforms our re-implementation of existing algorithms, which in some cases already outperform the original authors' publicly available codes.

[1]  Christopher Zach,et al.  Robust Bundle Adjustment Revisited , 2014, ECCV.

[2]  Takayuki Okatani,et al.  On the Wiberg Algorithm for Matrix Factorization in the Presence of Missing Components , 2007, International Journal of Computer Vision.

[3]  Aleix M. Martínez,et al.  Computing Smooth Time Trajectories for Camera and Deformable Shape in Structure from Motion with Occlusion , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Dianne P. O'Leary,et al.  Variable projection for nonlinear least squares problems , 2012, Computational Optimization and Applications.

[5]  Alessio Del Bue,et al.  Bilinear Modeling via Augmented Lagrange Multipliers (BALM) , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Andrew W. Fitzgibbon,et al.  Damped Newton algorithms for matrix factorization with missing data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  T. Minka Old and New Matrix Algebra Useful for Statistics , 2000 .

[8]  Axel Ruhe,et al.  Algorithms for separable nonlinear least squares problems , 1980 .

[9]  Takayuki Okatani,et al.  Efficient algorithm for low-rank matrix factorization with missing components and performance comparison of latest algorithms , 2011, 2011 International Conference on Computer Vision.

[10]  Pei Chen,et al.  Optimization Algorithms on Subspaces: Revisiting Missing Data Problem in Low-Rank Matrix , 2008, International Journal of Computer Vision.

[11]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[12]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[13]  Alexandre Bernardino,et al.  Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[15]  Pushmeet Kohli,et al.  Unwrap mosaics: a new representation for video editing , 2008, SIGGRAPH 2008.

[16]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[17]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[18]  Rafael Martí Multi-Start Methods , 2003, Handbook of Metaheuristics.

[19]  Dennis Strelow,et al.  General and nested Wiberg minimization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  R. Vidal,et al.  Motion segmentation with missing data using PowerFactorization and GPCA , 2004, CVPR 2004.

[21]  Ivan Markovsky,et al.  Optimization on a Grassmann manifold with application to system identification , 2014, Autom..

[22]  Pei Chen,et al.  Hessian Matrix vs. Gauss-Newton Hessian Matrix , 2011, SIAM J. Numer. Anal..

[23]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[24]  Robert E. Mahony,et al.  The geometry of weighted low-rank approximations , 2003, IEEE Trans. Signal Process..

[25]  Aeron Buchanan Morgan,et al.  Investigation into Matrix Factorization when Elements are Unknown , 2004 .

[26]  Pierre-Antoine Absil,et al.  RTRMC: A Riemannian trust-region method for low-rank matrix completion , 2011, NIPS.

[27]  Gene H. Golub,et al.  The differentiation of pseudo-inverses and non-linear least squares problems whose variables separate , 1972, Milestones in Matrix Computation.

[28]  Dennis Strelow,et al.  General and Nested Wiberg Minimization: L 2 and Maximum Likelihood , 2012, ECCV.