Coordinate Descent Algorithms With Coupling Constraints : Lessons Learned

Coordinate descent methods are enjoying renewed interest due to their simplicity and success in many machine learning applications. Given recent theoretical results on random coordinate descent with linear coupling constraints, we develop a software architecture for this class of algorithms. A software architecture has to (1) maintain solution feasibility, (2) be applicable to different execution environments, whether local or distributed and (3) decouple problem-specific logic from the execution environment. We demonstrate that due to the nature of these algorithms, these requirements raise some issues that are not present in many other classes of machine learning algorithms and thus can be overlooked when designing a generic machine learning system.

[1]  A. Auslender Optimisation : méthodes numériques , 1976 .

[2]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[3]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[4]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[5]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[6]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[7]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[8]  Inderjit S. Dhillon,et al.  Fast coordinate descent methods with variable selection for non-negative matrix factorization , 2011, KDD.

[9]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[10]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[11]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[12]  Alexander J. Smola,et al.  Scalable inference in latent variable models , 2012, WSDM '12.

[13]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[14]  Y. Nesterov,et al.  A RANDOM COORDINATE DESCENT METHOD ON LARGE-SCALE OPTIMIZATION PROBLEMS WITH LINEAR CONSTRAINTS , 2013 .

[15]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[16]  Ion Necoara,et al.  A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints , 2013, Comput. Optim. Appl..

[17]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[18]  Peter Richtárik,et al.  Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..

[19]  Peter Richtárik,et al.  Distributed coordinate descent method for learning with big data , 2016 .

[20]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.