Stochastic DCA for minimizing a large sum of DC functions with application to Multi-class Logistic Regression

We consider the large sum of DC (Difference of Convex) functions minimization problem which appear in several different areas, especially in stochastic optimization and machine learning. Two DCA (DC Algorithm) based algorithms are proposed: stochastic DCA and inexact stochastic DCA. We prove that the convergence of both algorithms to a critical point is guaranteed with probability one. Furthermore, we develop our stochastic DCA for solving an important problem in multi-task learning, namely group variables selection in multi class logistic regression. The corresponding stochastic DCA is very inexpensive, all computations are explicit. Numerical experiments on several benchmark datasets and synthetic datasets illustrate the efficiency of our algorithms and their superiority over existing methods, with respect to classification accuracy, sparsity of solution as well as running time.

[1]  Dimitri P. Bertsekas,et al.  Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[2]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[3]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[4]  Kim,et al.  A Gradient-Based Optimization Algorithm for LASSO , 2008 .

[5]  H. White,et al.  Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. , 2001, Journal of clinical epidemiology.

[6]  J. G. Liao,et al.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[7]  Dimitri P. Bertsekas,et al.  Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.

[8]  Le Thi Hoai An,et al.  Recent Advances in DC Programming and DCA , 2013, Trans. Comput. Collect. Intell..

[9]  D. Cox The Regression Analysis of Binary Sequences , 2017 .

[10]  Lee W. Schruben,et al.  Retrospective simulation response optimization , 1991, 1991 Winter Simulation Conference Proceedings..

[11]  Duy Nhat Phan,et al.  DC programming and DCA for sparse optimal scoring problem , 2016 .

[12]  W. Gander,et al.  A D.C. OPTIMIZATION ALGORITHM FOR SOLVING THE TRUST-REGION SUBPROBLEM∗ , 1998 .

[13]  Zeyuan Allen Zhu,et al.  Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[14]  J. Neveu,et al.  Discrete Parameter Martingales , 1975 .

[15]  Niels Richard Hansen,et al.  Sparse group lasso and high dimensional multinomial classification , 2012, Comput. Stat. Data Anal..

[16]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[17]  Le Thi Hoai An,et al.  A DC programming approach for feature selection in support vector machines learning , 2008, Adv. Data Anal. Classif..

[18]  Alain Rakotomamonjy,et al.  DC Proximal Newton for Nonconvex Optimization Problems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[19]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[20]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[21]  Huan Li,et al.  Accelerated Proximal Gradient Methods for Nonconvex Programming , 2015, NIPS.

[22]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[23]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[24]  Gary King,et al.  Logistic Regression in Rare Events Data , 2001, Political Analysis.

[25]  R. Tibshirani,et al.  Penalized classification using Fisher's linear discriminant , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[26]  H. Robbins A Stochastic Approximation Method , 1951 .

[27]  Le Thi Hoai An,et al.  The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems , 2005, Ann. Oper. Res..

[28]  Le Thi Hoai An,et al.  Group variable selection via ℓp, 0 regularization and application to optimal scoring , 2019, Neural Networks.

[29]  Le Thi Hoai An,et al.  DC programming and DCA: thirty years of developments , 2018, Math. Program..

[30]  Le Thi Hoai An,et al.  DC approximation approaches for sparse optimization , 2014, Eur. J. Oper. Res..

[31]  El Bernoussi Souad,et al.  Algorithms for Solving a Class of Nonconvex Optimization Problems. Methods of Subgradients , 1986 .

[32]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[33]  Abdulhamit Subasi,et al.  Classification of EEG signals using neural network and logistic regression , 2005, Comput. Methods Programs Biomed..

[34]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[35]  Le Thi Hoai An,et al.  Stochastic DCA for the Large-sum of Non-convex Functions Problem and its Application to Group Variable Selection in Classification , 2017, ICML.

[36]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[37]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[38]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[39]  Le Thi Hoai An,et al.  Sparse Covariance Matrix Estimation by DCA-Based Algorithms , 2017, Neural Computation.

[40]  Justin Domke,et al.  Finito: A faster, permutable incremental gradient method for big data problems , 2014, ICML.

[41]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[42]  W. Copes,et al.  Evaluating trauma care: the TRISS method. Trauma Score and the Injury Severity Score. , 1987, The Journal of trauma.

[43]  Alexander J. Smola,et al.  Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.