A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix Learning

Matrix learning is at the core of many machine learning problems. A number of real-world applications such as collaborative filtering and text mining can be formulated as a low-rank matrix completion problems, which recovers incomplete matrix using low-rank assumptions. To ensure that the matrix solution has a low rank, a recent trend is to use nonconvex regularizers that adaptively penalize singular values. They offer good recovery performance and have nice theoretical properties, but are computationally expensive due to repeated access to individual singular values. In this paper, based on the key insight that adaptive shrinkage on singular values improve empirical performance, we propose a new nonconvex low-rank regularizer called ”nuclear norm minus Frobenius norm” regularizer, which is scalable, adaptive and sound. We first show it provably holds the adaptive shrinkage property. Further, we discover its factored form which bypasses the computation of singular values and allows fast optimization by general optimization algorithms. Stable recovery and convergence are guaranteed. Extensive low-rank matrix completion experiments on a number of synthetic and real-world data sets show that the proposed method obtains state-of-the-art recovery performance while being the fastest in comparison to existing low-rank matrix learning methods. 1

[1]  Lei Zhang,et al.  Weighted Nuclear Norm Minimization with Application to Image Denoising , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Tie-Yan Liu,et al.  Large-Scale Low-Rank Matrix Learning with Nonconvex Regularizers , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Shuicheng Yan,et al.  Nonconvex Nonsmooth Low Rank Minimization via Iteratively Reweighted Nuclear Norm , 2015, IEEE Transactions on Image Processing.

[4]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[5]  Rahul Mazumder,et al.  Matrix completion with nonconvex regularization: spectral operators and scalable algorithms , 2018, Statistics and Computing.

[6]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[7]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[8]  Ting-Zhu Huang,et al.  Truncated l1-2 Models for Sparse Recovery and Rank Minimization , 2017, SIAM J. Imaging Sci..

[9]  Li Shang,et al.  AdaError: An Adaptive Learning Rate Method for Matrix Approximation-based Collaborative Filtering , 2018, WWW.

[10]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[11]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[12]  Nathan Srebro,et al.  Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[13]  Feiping Nie,et al.  Low-Rank Matrix Recovery via Efficient Schatten p-Norm Minimization , 2012, AAAI.

[14]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[15]  Yuanyuan Liu,et al.  Tractable and Scalable Schatten Quasi-Norm Approximations for Rank Minimization , 2016, AISTATS.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Yin Zhang,et al.  Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Mathematical Programming Computation.

[18]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[19]  J. Hiriart-Urruty Generalized Differentiability / Duality and Optimization for Problems Dealing with Differences of Convex Functions , 1985 .

[20]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[21]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[22]  Qi Yu,et al.  Fast Multivariate Spatio-temporal Analysis via Low Rank Tensor Learning , 2014, NIPS.

[23]  Xiao Zhang,et al.  A Unified Computational and Statistical Framework for Nonconvex Low-rank Matrix Estimation , 2016, AISTATS.

[24]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[25]  Mohit Sharma,et al.  Adaptive matrix completion for the users and the items in tail , 2019, WWW.

[26]  Pradeep Ravikumar,et al.  Collaborative Filtering with Graph Information: Consistency and Scalable Methods , 2015, NIPS.

[27]  Shuicheng Yan,et al.  Generalized Singular Value Thresholding , 2014, AAAI.

[28]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[29]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[30]  P. Absil,et al.  Low-rank matrix completion via preconditioned optimization on the Grassmann manifold , 2015, Linear Algebra and its Applications.

[31]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[32]  Jaegul Choo,et al.  Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations , 2018, WWW.

[33]  Bart Vandereycken,et al.  Low-Rank Matrix Completion by Riemannian Optimization , 2013, SIAM J. Optim..

[34]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[35]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[36]  Ming Yan,et al.  Fast L1–L2 Minimization via a Proximal Operator , 2016, Journal of Scientific Computing.

[37]  Jiawei Han,et al.  Towards Faster Rates and Oracle Property for Low-Rank Matrix Estimation , 2016, ICML.

[38]  Xuelong Li,et al.  Fast and Accurate Matrix Completion via Truncated Nuclear Norm Regularization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[40]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[41]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[42]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[43]  Trevor J. Hastie,et al.  Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..

[44]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.