Conservative Stochastic Optimization with Expectation Constraints

This paper considers stochastic convex optimization problems where the objective and constraint functions involve expectations with respect to the data indices or environmental variables, in addition to deterministic convex constraints on the domain of the variables. Although the setting is generic and arises in different machine learning applications, online and efficient approaches for solving such problems have not been widely studied. Since the underlying data distribution is unknown a priori, a closed-form solution is generally not available, and classical deterministic optimization paradigms are not applicable. State-of-the-art approaches, such as those using the saddle point framework, can ensure that the optimality gap as well as the constraint violation decay as $O\left(T^{-\frac{1}{2}}\right)$ where $T$ is the number of stochastic gradients. The domain constraints are assumed simple and handled via projection at every iteration. In this work, we propose a novel conservative stochastic optimization algorithm (CSOA) that achieves zero constraint violation and $O\left(T^{-\frac{1}{2}}\right)$ optimality gap. Further, the projection operation (for scenarios when calculating projection is expensive) in the proposed algorithm can be avoided by considering the conditional gradient or Frank-Wolfe (FW) variant of the algorithm. The state-of-the-art stochastic FW variants achieve an optimality gap of $O\left(T^{-\frac{1}{3}}\right)$ after $T$ iterations, though these algorithms have not been applied to problems with functional expectation constraints. In this work, we propose the FW-CSOA algorithm that is not only projection-free but also achieves zero constraint violation with $O\left(T^{-\frac{1}{4}}\right)$ decay of the optimality gap. The efficacy of the proposed algorithms is tested on two relevant problems: fair classification and structured matrix completion.

[1]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[2]  Amin Karbasi,et al.  Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization , 2018, J. Mach. Learn. Res..

[3]  Ketan Rajawat,et al.  Asynchronous Saddle Point Algorithm for Stochastic Optimization in Heterogeneous Networks , 2019, IEEE Transactions on Signal Processing.

[4]  Ketan Rajawat,et al.  Optimal Design of Queuing Systems via Compositional Stochastic Programming , 2019, IEEE Transactions on Communications.

[5]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[6]  Wei Liu,et al.  Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization , 2015, IEEE Transactions on Knowledge and Data Engineering.

[7]  Xu Li,et al.  Designing a Hierarchical Decentralized System for Distributing Large-Scale, Cross-Sector, and Multipollutant Control Accountabilities , 2017, IEEE Systems Journal.

[8]  Preetam Nandy,et al.  Optimal Convergence for Stochastic Optimization with Multiple Expectation Constraints , 2019, 1906.03401.

[9]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[10]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[11]  Ketan Rajawat,et al.  Asynchronous Online Learning in Multi-Agent Systems With Proximity Constraints , 2019, IEEE Transactions on Signal and Information Processing over Networks.

[12]  Aryan Mokhtari,et al.  Parallel Stochastic Successive Convex Approximation Method for Large-Scale Dictionary Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Wei Wang,et al.  Sample average approximation of expected value constrained stochastic programs , 2008, Oper. Res. Lett..

[14]  Xiaohan Wei,et al.  Online Convex Optimization with Stochastic Constraints , 2017, NIPS.

[15]  Ketan Rajawat,et al.  Asynchronous Saddle Point Method: Interference Management Through Pricing , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[16]  R. Rubinstein,et al.  An Efficient Stochastic Approximation Algorithm for Stochastic Saddle Point Problems , 2005 .

[17]  Deanna Needell,et al.  Matrix Completion for Structured Observations , 2018, 2018 Information Theory and Applications Workshop (ITA).

[18]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[19]  Guanghui Lan,et al.  Algorithms for stochastic optimization with expectation constraints , 2016, 1604.03887.

[20]  Jinfeng Yi,et al.  Stochastic Gradient Descent with Only One Projection , 2012, NIPS.

[21]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[22]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[23]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[24]  David Tse,et al.  Fundamentals of Wireless Communication , 2005 .

[25]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[26]  Alexander Shapiro,et al.  Convex Approximations of Chance Constrained Programs , 2006, SIAM J. Optim..

[27]  Angelia Nedic,et al.  Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.

[28]  John R. Spletzer,et al.  Convex Optimization Strategies for Coordinating Large-Scale Robot Formations , 2007, IEEE Transactions on Robotics.

[29]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[30]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[31]  Indrăź źLiobaităź,et al.  Measuring discrimination in algorithmic decision making , 2017 .

[32]  Xiantao Xiao,et al.  Stochastic Approximation Proximal Method of Multipliers for Convex Stochastic Programming , 2019, Mathematics of Operations Research.

[33]  Rong Jin,et al.  Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[34]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[35]  Indre Zliobaite,et al.  Measuring discrimination in algorithmic decision making , 2017, Data Mining and Knowledge Discovery.