Conservative Stochastic Optimization With Expectation Constraints

This paper considers stochastic convex optimization problems where the objective and constraint functions involve expectations with respect to the data indices or environmental variables, in addition to deterministic convex constraints on the domain of the variables. Since the underlying data distribution is unknown a priori, a closed-form solution is generally not available, and classical deterministic optimization paradigms are not applicable. State-of-the-art approaches, such as those using the saddle point framework, are able to ensure that the optimality gap as well as the constraint violation decay as <inline-formula><tex-math notation="LaTeX">$\mathcal{O}(T^{-\frac{1}{2}})$</tex-math></inline-formula> where <inline-formula><tex-math notation="LaTeX">$T$</tex-math></inline-formula> is the number of stochastic gradients. In this work, we propose a novel conservative stochastic optimization algorithm (CSOA) that achieves zero average constraint violation and <inline-formula><tex-math notation="LaTeX">$\mathcal{O}(T^{-\frac{1}{2}})$</tex-math></inline-formula> optimality gap.</p> <p>Further, we also consider the scenario where carrying out a projection step onto the convex domain constraints at every iteration is not viable. Traditionally, the projection operation can be avoided by considering the conditional gradient or Frank-Wolfe (FW) variant of the algorithm. The state-of-the-art stochastic FW variants achieve an optimality gap of <inline-formula><tex-math notation="LaTeX">$\mathcal{O}(T^{-\frac{1}{3}})$</tex-math></inline-formula> after <inline-formula><tex-math notation="LaTeX">$T$</tex-math></inline-formula> iterations, though these algorithms have not been applied to problems with functional expectation constraints. In this work, we propose the FW-CSOA algorithm that is not only projection-free but also achieves zero average constraint violation with <inline-formula><tex-math notation="LaTeX">$\mathcal{O}(T^{-\frac{1}{4}})$</tex-math></inline-formula> decay of the optimality gap. The efficacy of the proposed algorithms is tested on two relevant problems: fair classification and structured matrix completion.

[1]  Wei Liu,et al.  Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization , 2015, IEEE Transactions on Knowledge and Data Engineering.

[2]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[3]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[4]  Ketan Rajawat,et al.  Optimal Design of Queuing Systems via Compositional Stochastic Programming , 2019, IEEE Transactions on Communications.

[5]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[6]  Guanghui Lan,et al.  Conditional Gradient Methods for Convex Optimization with Function Constraints , 2020, ArXiv.

[7]  Rong Jin,et al.  Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[8]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[9]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[10]  Wei Wang,et al.  Sample average approximation of expected value constrained stochastic programs , 2008, Oper. Res. Lett..

[11]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[12]  Xiantao Xiao,et al.  Stochastic Approximation Proximal Method of Multipliers for Convex Stochastic Programming , 2019, Mathematics of Operations Research.

[13]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[14]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[15]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[16]  Ketan Rajawat,et al.  Asynchronous Online Learning in Multi-Agent Systems With Proximity Constraints , 2019, IEEE Transactions on Signal and Information Processing over Networks.

[17]  Guanghui Lan,et al.  Stochastic first-order methods for convex and nonconvex functional constrained optimization , 2019, Mathematical Programming.

[18]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[19]  Xu Li,et al.  Designing a Hierarchical Decentralized System for Distributing Large-Scale, Cross-Sector, and Multipollutant Control Accountabilities , 2017, IEEE Systems Journal.

[20]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[21]  Amin Karbasi,et al.  Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization , 2018, J. Mach. Learn. Res..

[22]  R. Rubinstein,et al.  An Efficient Stochastic Approximation Algorithm for Stochastic Saddle Point Problems , 2005 .

[23]  Avinash N. Madavan,et al.  Subgradient Methods for Risk-Sensitive Optimization , 2019 .

[24]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[25]  Alexander Shapiro,et al.  Convex Approximations of Chance Constrained Programs , 2006, SIAM J. Optim..

[26]  Indrăź źLiobaităź,et al.  Measuring discrimination in algorithmic decision making , 2017 .

[27]  Preetam Nandy,et al.  Optimal Convergence for Stochastic Optimization with Multiple Expectation Constraints , 2019, 1906.03401.

[28]  Ketan Rajawat,et al.  Asynchronous Saddle Point Method: Interference Management Through Pricing , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[29]  David Tse,et al.  Fundamentals of Wireless Communication , 2005 .

[30]  Stochastic Compositional Gradient Descent under Compositional constraints , 2020, ArXiv.

[31]  Angelia Nedic,et al.  Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.

[32]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[33]  Ketan Rajawat,et al.  Network Resource Allocation via Stochastic Subgradient Descent: Convergence Rate , 2017, IEEE Transactions on Communications.

[34]  Aryan Mokhtari,et al.  Parallel Stochastic Successive Convex Approximation Method for Large-Scale Dictionary Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Zhiqiang Zhou,et al.  Algorithms for stochastic optimization with function or expectation constraints , 2020, Comput. Optim. Appl..

[36]  Xiaohan Wei,et al.  Online Convex Optimization with Stochastic Constraints , 2017, NIPS.

[37]  Ketan Rajawat,et al.  Momentum based Projection Free Stochastic Optimization Under Affine Constraints , 2021, 2021 American Control Conference (ACC).

[38]  Guanghui Lan,et al.  Algorithms for stochastic optimization with expectation constraints , 2016, 1604.03887.

[39]  Indre Zliobaite,et al.  Measuring discrimination in algorithmic decision making , 2017, Data Mining and Knowledge Discovery.

[40]  Jinfeng Yi,et al.  Stochastic Gradient Descent with Only One Projection , 2012, NIPS.

[41]  John R. Spletzer,et al.  Convex Optimization Strategies for Coordinating Large-Scale Robot Formations , 2007, IEEE Transactions on Robotics.

[42]  Ketan Rajawat,et al.  Asynchronous Saddle Point Algorithm for Stochastic Optimization in Heterogeneous Networks , 2019, IEEE Transactions on Signal Processing.

[43]  Deanna Needell,et al.  Matrix Completion for Structured Observations , 2018, 2018 Information Theory and Applications Workshop (ITA).