Stochastic Second-Order Method for Large-Scale Nonconvex Sparse Learning Models

Sparse learning models have shown promising performance in the high dimensional machine learning applications. The main challenge of sparse learning models is how to optimize it efficiently. Most existing methods solve this problem by relaxing it as a convex problem, incurring large estimation bias. Thus, the sparse learning model with nonconvex constraint has attracted much attention due to its better performance. But it is difficult to optimize due to the non-convexity. In this paper, we propose a linearly convergent stochastic secondorder method to optimize this nonconvex problem for large-scale datasets. The proposed method incorporates the second-order information to improve the convergence speed. Theoretical analysis shows that our proposed method enjoys linear convergence rate and guarantees to converge to the underlying true model parameter. Experimental results have verified the efficiency and correctness of our proposed method.

[1]  J. Cavanaugh Biostatistics , 2005, Definitions.

[2]  Vincent Y. F. Tan,et al.  Stochastic L-BFGS Revisited: Improved Convergence Rates and Practical Acceleration Strategies , 2017, UAI.

[3]  Michael I. Jordan,et al.  A Linearly-Convergent Stochastic L-BFGS Algorithm , 2015, AISTATS.

[4]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[5]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[6]  Jarvis D. Haupt,et al.  Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction , 2016 .

[7]  C. Bachoc,et al.  Applied and Computational Harmonic Analysis Tight P-fusion Frames , 2022 .

[8]  Nocedal,et al.  Numerical Optimization, 2nd edition , 2020 .

[9]  Nicholas I. M. Gould,et al.  SIAM Journal on Optimization , 2012 .

[10]  Deanna Needell,et al.  Linear Convergence of Stochastic Iterative Greedy Algorithms With Sparse Constraints , 2014, IEEE Transactions on Information Theory.

[11]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[12]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[13]  Jinghui Chen,et al.  Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization , 2016, UAI.

[14]  Xiao-Tong Yuan,et al.  Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization , 2013, ICML.

[15]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[16]  Feiping Nie,et al.  Local Centroids Structured Non-Negative Matrix Factorization , 2017, AAAI.

[17]  Jorge Nocedal,et al.  A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..

[18]  Qingshan Liu,et al.  Newton Greedy Pursuit: A Quadratic Approximation Method for Sparsity-Constrained Optimization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Robert M. Gower,et al.  Stochastic Block BFGS: Squeezing More Curvature out of Data , 2016, ICML.

[20]  Lin Yan,et al.  Anatomical Annotations for Drosophila Gene Expression Patterns via Multi-Dimensional Visual Descriptors Integration: Multi-Dimensional Feature Learning , 2015, KDD.

[21]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[22]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.