Safe Online Convex Optimization with Unknown Linear Safety Constraints

We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints. The goal is to select a sequence of actions to minimize the regret without violating the safety constraints at any time step (with high probability). The parameters that specify the linear safety constraints are unknown to the algorithm. The algorithm has access to only the noisy observations of constraints for the chosen actions. We propose an algorithm, called the Safe Online Projected Gradient Descent (SO-PGD) algorithm, to address this problem. We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret O(T ). While there are many algorithms for online convex optimization (OCO) problems with safety constraints available in the literature, they allow constraint violations during learning/optimization, and the focus has been on characterizing the cumulative constraint violations. To the best of our knowledge, ours is the first work that provides an algorithm with provable guarantees on the regret, without violating the linear safety constraints (with high probability) at any time step.

[1]  Rong Jin,et al.  Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[2]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[3]  Toshihide Ibaraki,et al.  Resource allocation problems - algorithmic approaches , 1988, MIT Press series in the foundations of computing.

[4]  Amin Karbasi,et al.  Safe Learning under Uncertain Objectives and Constraints , 2020, ArXiv.

[5]  Kia Khezeli,et al.  Safe Linear Stochastic Bandits , 2019, AAAI.

[6]  Hao Yu,et al.  Online Convex Optimization with Time-Varying Constraints , 2017, 1702.04783.

[7]  Xiaohan Wei,et al.  Online Convex Optimization with Stochastic Constraints , 2017, NIPS.

[8]  Joel A. Tropp,et al.  An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..

[9]  Andreas Krause,et al.  Safe Convex Learning under Uncertain Constraints , 2019, AISTATS.

[10]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[11]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[12]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[13]  H. Kwakernaak,et al.  Feedback Systems , 2009, Encyclopedia of Database Systems.

[14]  Christos Thrampoulidis,et al.  Linear Stochastic Bandits Under Safety Constraints , 2019, NeurIPS.

[15]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[16]  Ashish Kapoor,et al.  Safety-Aware Algorithms for Adversarial Contextual Bandit , 2017, ICML.

[17]  Gabriela Hug,et al.  Learning to control in power systems: Design and analysis guidelines for concrete safety problems , 2020 .

[18]  Geoffrey J. Gordon Regret bounds for prediction problems , 1999, COLT '99.

[19]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[20]  K. J. Ray Liu,et al.  Online Convex Optimization With Time-Varying Constraints and Bandit Feedback , 2019, IEEE Transactions on Automatic Control.

[21]  Qing Ling,et al.  An Online Convex Optimization Approach to Proactive Network Resource Allocation , 2017, IEEE Transactions on Signal Processing.

[22]  Karl Henrik Johansson,et al.  Distributed Online Convex Optimization With Time-Varying Coupled Inequality Constraints , 2019, IEEE Transactions on Signal Processing.

[23]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[24]  Hao Yu,et al.  A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints , 2020, J. Mach. Learn. Res..

[25]  Ying-Chang Liang,et al.  Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[26]  Xiaohan Wei,et al.  Online Primal-Dual Mirror Descent under Stochastic Constraints , 2019, Proc. ACM Meas. Anal. Comput. Syst..