Online Optimization with Feedback Delay and Nonlinear Switching Cost

We study a variant of online optimization in which the learner receives k-round delayed feedback about hitting cost and there is a multi-step nonlinear switching cost, i.e., costs depend on multiple previous actions in a nonlinear manner. Our main result shows that a novel Iterative Regularized Online Balanced Descent (iROBD) algorithm has a constant, dimensionfree competitive ratio that is O(L), where L is the Lipschitz constant of the switching cost. Additionally, we provide lower bounds that illustrate the Lipschitz condition is required and the dependencies on k and L are tight. Finally, via reductions, we show that this setting is closely related to online control problems with delay, nonlinear dynamics, and adversarial disturbances, where iROBD directly offers constant-competitive online policies.

[1]  András György,et al.  Online Learning under Delayed Feedback , 2013, ICML.

[2]  Yin Tat Lee,et al.  Chasing Nested Convex Bodies Nearly Optimally , 2018, SODA.

[3]  Soon-Jo Chung,et al.  Neural Lander: Stable Drone Landing Control Using Learned Dynamics , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[4]  Ohad Shamir,et al.  Online Learning with Local Permutations and Delayed Feedback , 2017, ICML.

[5]  Elad Hazan,et al.  Adaptive Regret for Control of Time-Varying Dynamics , 2020, ArXiv.

[6]  Lang Tong,et al.  iEMS for large scale charging of electric vehicles: Architecture and optimal online scheduling , 2012, 2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm).

[7]  Adam Wierman,et al.  Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization , 2019, NeurIPS.

[8]  Adam Wierman,et al.  Using Predictions in Online Optimization: Looking Forward with an Eye on the Past , 2016, SIGMETRICS.

[9]  Anupam Gupta,et al.  Chasing Convex Bodies with Linear Competitive Ratio , 2019, SODA.

[10]  Adam Wierman,et al.  Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent , 2018, COLT.

[11]  Kamyar Azizzadenesheli,et al.  Meta-Adaptive Nonlinear Control: Theory and Algorithms , 2021, NeurIPS.

[12]  Shie Mannor,et al.  Online Learning for Adversaries with Memory: Price of Past Mistakes , 2015, NIPS.

[13]  Yisong Yue,et al.  The Power of Predictions in Online Control , 2020, NeurIPS.

[14]  Adam Wierman,et al.  Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems , 2021, NeurIPS.

[15]  Yisong Yue,et al.  Neural-Swarm2: Planning and Control of Heterogeneous Multirotor Swarms Using Learned Interactions , 2020, IEEE Transactions on Robotics.

[16]  Na Li,et al.  Online Optimal Control with Linear Dynamics and Predictions: Algorithms and Regret Analysis , 2019, NeurIPS.

[17]  Karan Singh,et al.  Logarithmic Regret for Online Control , 2019, NeurIPS.

[18]  Shay Moran,et al.  Online Agnostic Boosting via Regret Minimization , 2020, NeurIPS.

[19]  Sham M. Kakade,et al.  Online Control with Adversarial Disturbances , 2019, ICML.

[20]  Na Li,et al.  Using Predictions in Online Optimization with Switching Costs: A Fast Algorithm and A Fundamental Limit , 2018, 2018 Annual American Control Conference (ACC).

[21]  Mark Sellke Chasing Convex Bodies Optimally , 2019, SODA.

[22]  Yin Tat Lee,et al.  Competitively chasing convex bodies , 2018, STOC.

[23]  Mi-Ching Tsai,et al.  Robust and Optimal Control , 2014 .

[24]  Lachlan L. H. Andrew,et al.  Online algorithms for geographical load balancing , 2012, 2012 International Green Computing Conference (IGCC).

[25]  Yisong Yue,et al.  Competitive Control with Delayed Imperfect Information , 2020, ArXiv.

[26]  Kevin Schewior,et al.  A Tight Lower Bound for Online Convex Optimization with Switching Costs , 2017, WAOA.

[27]  Georgios B. Giannakis,et al.  An Online Convex Optimization Approach to Real-Time Energy Pricing for Demand Response , 2017, IEEE Transactions on Smart Grid.

[28]  Adam Wierman,et al.  Online Optimization with Memory and Competitive Control , 2020, NeurIPS.

[29]  Max Simchowitz,et al.  Improper Learning for Non-Stochastic Control , 2020, COLT.

[30]  Adam Wierman,et al.  Online convex optimization with ramp constraints , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[31]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[32]  Kirk Pruhs,et al.  A 2-Competitive Algorithm For Online Convex Optimization With Switching Costs , 2015, APPROX-RANDOM.

[33]  Na Li,et al.  Online Optimization With Predictions and Switching Costs: Fast Algorithms and the Fundamental Limit , 2018, IEEE Transactions on Automatic Control.

[34]  Lachlan L. H. Andrew,et al.  Online Convex Optimization Using Predictions , 2015, SIGMETRICS.

[35]  Adam Wierman,et al.  Pricing data center demand response , 2014, SIGMETRICS '14.

[36]  Adam Wierman,et al.  An Online Algorithm for Smoothed Regression and LQR Control , 2018, AISTATS.

[37]  D. Luenberger Canonical forms for linear multivariable systems , 1967, IEEE Transactions on Automatic Control.