Distributed nonconvex optimization for sparse representation

We consider a non-convex constrained Lagrangian formulation of a fundamental bi-criteria optimization problem for variable selection in statistical learning; the two criteria are a smooth (possibly) non-convex loss function, measuring the fitness of the model to data, and the latter function is a difference-of-convex (DC) regularization, employed to promote some extra structure on the solution, like sparsity. This general class of nonconvex problems arises in many big-data applications, from statistical machine learning to physical sciences and engineering. We develop the first unified distributed algorithmic framework for these problems and establish its asymptotic convergence to d-stationary solutions. Two key features of the method are: i) it can be implemented on arbitrary networks (digraphs) with (possibly) time-varying connectivity; and ii) it does not require the restrictive assumption that the (sub)gradient of the objective function is bounded, which enlarges significantly the class of statistical learning problems that can be solved with convergence guarantees.

[1]  J.N. Tsitsiklis,et al.  Convergence in Multiagent Coordination, Consensus, and Flocking , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[2]  Pascal Bianchi,et al.  Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.

[3]  Lihua Xie,et al.  Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[4]  Daniel Pérez Palomar,et al.  Distributed nonconvex multiagent optimization over time-varying networks , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[5]  David Zhang,et al.  A Survey of Sparse Representation: Algorithms and Applications , 2015, IEEE Access.

[6]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[7]  Jong-Shi Pang,et al.  Computing B-Stationary Points of Nonsmooth DC Programs , 2015, Math. Oper. Res..

[8]  Angelia Nedic,et al.  Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs , 2014, IEEE Transactions on Automatic Control.

[9]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[10]  Jack Xin,et al.  Point Source Super-resolution Via Non-convex $$L_1$$L1 Based Methods , 2016, J. Sci. Comput..

[11]  João M. F. Xavier,et al.  D-ADMM: A Communication-Efficient Distributed Algorithm for Separable Optimization , 2012, IEEE Transactions on Signal Processing.

[12]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[13]  Bhaskar D. Rao,et al.  An affine scaling methodology for best basis selection , 1999, IEEE Trans. Signal Process..

[14]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[15]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[16]  H.C. Papadopoulos,et al.  Locally constructed algorithms for distributed computations in ad-hoc networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[17]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[18]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[19]  Angelia Nedic,et al.  Distributed Optimization Over Time-Varying Directed Graphs , 2015, IEEE Trans. Autom. Control..

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  Behrouz Touri,et al.  Non-Convex Distributed Optimization , 2015, IEEE Transactions on Automatic Control.

[22]  Le Thi Hoai An,et al.  DC approximation approaches for sparse optimization , 2014, Eur. J. Oper. Res..

[23]  BachFrancis,et al.  Optimization with Sparsity-Inducing Penalties , 2012 .

[24]  Jack Xin,et al.  Minimization of transformed $$L_1$$L1 penalty: theory, difference of convex function algorithm, and robust application in compressed sensing , 2014, Math. Program..

[25]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[26]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[27]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[28]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[29]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[30]  Jack Xin,et al.  Difference-of-Convex Learning: Directional Stationarity, Optimality, and Sparsity , 2017, SIAM J. Optim..

[31]  J. Cortés,et al.  When does a digraph admit a doubly stochastic adjacency matrix? , 2010, Proceedings of the 2010 American Control Conference.

[32]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[33]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[34]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[35]  Jack Xin,et al.  Minimization of ℓ1-2 for Compressed Sensing , 2015, SIAM J. Sci. Comput..

[36]  Bahman Gharesifard,et al.  Distributed Continuous-Time Convex Optimization on Weight-Balanced Digraphs , 2012, IEEE Transactions on Automatic Control.

[37]  Francisco Facchinei,et al.  Hybrid Random/Deterministic Parallel Algorithms for Convex and Nonconvex Big Data Optimization , 2014, IEEE Transactions on Signal Processing.

[38]  Francisco Facchinei,et al.  Parallel Selective Algorithms for Nonconvex Big Data Optimization , 2014, IEEE Transactions on Signal Processing.

[39]  Stephen P. Boyd,et al.  A scheme for robust distributed sensor fusion based on average consensus , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[40]  Francisco Facchinei,et al.  Decomposition by Partial Linearization: Parallel Optimization of Multi-Agent Systems , 2013, IEEE Transactions on Signal Processing.