Tracking Adversarial Targets

We study linear control problems with quadratic losses and adversarially chosen tracking targets. We present an efficient algorithm for this problem and show that, under standard conditions on the linear system, its regret with respect to an optimal linear policy grows as O(log2T), where T is the number of rounds of the game. We also study a problem with adversarially chosen transition dynamics; we present an exponentially-weighted average algorithm for this problem, and we give regret bounds that grow as O(√T).

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  T. Lai,et al.  Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .

[3]  Han-Fu Chen,et al.  Optimal adaptive control and consistent parameter estimates for ARMAX model with quadratic cost , 1986, 1986 25th IEEE Conference on Decision and Control.

[4]  T. Lai,et al.  Asymptotically efficient self-tuning regulators , 1987 .

[5]  Han-Fu Chen,et al.  Optimal adaptive control and consistent parameter estimates for ARMAX model withquadratic cost , 1987 .

[6]  Han-Fu Chen,et al.  Identification and adaptive control for systems with unknown orders, delay, and coefficients , 1990 .

[7]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[8]  Claude-Nicolas Fiechter,et al.  PAC adaptive control of linear systems , 1997, COLT '97.

[9]  P. Kumar,et al.  Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .

[10]  Zhiliang Ying,et al.  EFFICIENT RECURSIVE ESTIMATION AND ADAPTIVE CONTROL IN STOCHASTIC REGRESSION AND , 2006 .

[11]  S. Bittanti,et al.  ADAPTIVE CONTROL OF LINEAR TIME INVARIANT SYSTEMS: THE "BET ON THE BEST" PRINCIPLE ∗ , 2006 .

[12]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[13]  Yishay Mansour,et al.  Online Markov Decision Processes , 2009, Math. Oper. Res..

[14]  Csaba Szepesvari,et al.  The Online Loop-free Stochastic Shortest-Path Problem , 2010, Annual Conference Computational Learning Theory.

[15]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[16]  András György,et al.  The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.

[17]  Peter L. Bartlett,et al.  Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions , 2013, NIPS.

[18]  Csaba Szepesvari,et al.  Markov Decision Processes under Bandit Feedback , 2015 .