Stochastic Modified Equations for Continuous Limit of Stochastic ADMM

Stochastic version of alternating direction method of multiplier (ADMM) and its variants (linearized ADMM, gradient-based ADMM) plays a key role for modern large scale machine learning problems. One example is the regularized empirical risk minimization problem. In this work, we put different variants of stochastic ADMM into a unified form, which includes standard, linearized and gradient-based ADMM with relaxation, and study their dynamics via a continuous-time model approach. We adapt the mathematical framework of stochastic modified equation (SME), and show that the dynamics of stochastic ADMM is approximated by a class of stochastic differential equations with small noise parameters in the sense of weak approximation. The continuous-time analysis would uncover important analytical insights into the behaviors of the discrete-time algorithm, which are non-trivial to gain otherwise. For example, we could characterize the fluctuation of the solution paths precisely, and decide optimal stopping time to minimize the variance of solution paths.

[1]  H. Jeffreys,et al.  Theory of probability , 1896 .

[2]  E. Helfand Numerical integration of stochastic differential equations , 1979, The Bell System Technical Journal.

[3]  M. Freidlin,et al.  Random Perturbations of Dynamical Systems , 1984 .

[4]  S. R. S. Varadhan RANDOM PERTURBATIONS OF DYNAMICAL SYSTEMS (Grundlehren der mathematischen Wissenschaften, 260) , 1985 .

[5]  G. Mil’shtein Weak Approximation of Solutions of Systems of Stochastic Differential Equations , 1986 .

[6]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[7]  R. Seydel Numerical Integration of Stochastic Differential Equations , 2004 .

[8]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[9]  Arindam Banerjee,et al.  Online Alternating Direction Method , 2012, ICML.

[10]  Alexander G. Gray,et al.  Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[11]  s-taiji Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method , 2013 .

[12]  V. Mackevičius Numerical Solution of Stochastic Differential Equations , 2013 .

[13]  Shiqian Ma,et al.  Fast alternating linearization methods for minimizing the sum of two convex functions , 2009, Math. Program..

[14]  Leon Wenliang Zhong,et al.  Fast Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[15]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[16]  E Weinan,et al.  Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.

[17]  Daniel P. Robinson,et al.  ADMM and Accelerated ADMM as Continuous Dynamical Systems , 2018, ICML.

[18]  Yuren Zhou,et al.  Differential Inclusions for Modeling Nonsmooth ADMM Variants: A Continuous Limit Theory , 2019, ICML.

[19]  Heng Huang,et al.  Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization , 2020, ICML.

[20]  E Weinan,et al.  Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations , 2018, J. Mach. Learn. Res..

[21]  Lei Wu,et al.  Machine learning from a continuous viewpoint, I , 2019, Science China Mathematics.