Dynamic Programming Subject to Total Variation Distance Ambiguity

The aim of this paper is to address optimality of stochastic control strategies via dynamic programming subject to total variation distance ambiguity on the conditional distribution of the controlled process. We formulate the stochastic control problem using minimax theory, in which the control minimizes the payoff while the conditional distribution, from the total variation distance set, maximizes it. First, we investigate the maximization of a linear functional on the space of probability measures on abstract spaces, among those probability measures which are within a total variation distance from a nominal probability measure, and then we give the maximizing probability measure in closed form. Second, we utilize the solution of the maximization to solve minimax stochastic control with deterministic control strategies, under a Markovian and a non-Markovian assumption, on the conditional distributions of the controlled process. The results of this part include (1) minimax optimization subject to total va...

[1]  J. H. van Schuppen,et al.  On the optimal control of stochastic systems with an exponential-of-integral performance index , 1981 .

[2]  John B. Moore,et al.  Hidden Markov Models: Estimation and Control , 1994 .

[3]  Ian R. Petersen,et al.  Finite Horizon Minimax Optimal Control of Stochastic Partially Observed Time Varying Uncertain Systems , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[4]  Robert Malouf Maximum Entropy Models , 2010 .

[5]  M. Rabi,et al.  Maximum Entropy Models, Dynamic Games, and Robust Output Feedback Control for Automata , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[6]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[7]  T. Basar,et al.  H∞-0ptimal Control and Related Minimax Design Problems: A Dynamic Game Approach , 1996, IEEE Trans. Autom. Control..

[8]  P. Graefe Linear stochastic systems , 1966 .

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  Ian R. Petersen,et al.  Finite Horizon Minimax Optimal Control of Stochastic Partially Observed Time Varying Uncertain Systems , 1999, Math. Control. Signals Syst..

[11]  Ian R. Petersen,et al.  Minimax optimal control of stochastic uncertain systems with relative entropy constraints , 2000, IEEE Trans. Autom. Control..

[12]  W. Fleming Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .

[13]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[14]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[15]  P. Whittle A risk-sensitive maximum principle , 1990 .

[16]  L. Sennott Another set of conditions for average optimality in Markov control processes , 1995 .

[17]  V. Borkar On Minimum Cost Per Unit Time Control of Markov Chains , 1984 .

[18]  Sean P. Meyn The Policy Improvement Algorithm for Markov Decision Processes , 1997 .

[19]  Wolfgang J. Runggaldier,et al.  Connections between stochastic control and dynamic games , 1996, Math. Control. Signals Syst..

[20]  Manfred Schäl,et al.  On the Second Optimality Equation for Semi-Markov Decision Models , 1992, Math. Oper. Res..

[21]  Charalambos D. Charalambous,et al.  Dynamic programming with total variational distance uncertainty , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[22]  A. A. Yushkevich,et al.  On a Class of Strategies in General Markov Decision Models , 1974 .

[23]  Robert J. Elliott,et al.  A Finite-Dimensional Risk-Sensitive Control Problem , 1995 .

[24]  C. Charalambous,et al.  Minimum principle for partially observable nonlinear risk-sensitive control problems using measure-valued decompositions , 1996 .

[25]  Charalambos D. Charalambous,et al.  Extremum problems with total variation distance , 2013, 52nd IEEE Conference on Decision and Control.

[26]  S.,et al.  Risk-Sensitive Control and Dynamic Games for Partially Observed Discrete-Time Nonlinear Systems , 1994 .

[27]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[28]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[29]  James Flynn Conditions for the Equivalence of Optimality Criteria in Dynamic Programming , 1976 .

[30]  Charalambos D. Charalambous,et al.  Extremum Problems With Total Variation Distance and Their Applications , 2013, IEEE Transactions on Automatic Control.

[31]  N. U. Ahmed,et al.  Linear and Nonlinear Filtering for Scientists and Engineers , 1999 .

[32]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[33]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[34]  Nasir Uddin Ahmed,et al.  Optimal Control of Uncertain Stochastic Systems Subject to Total Variation Distance Uncertainty , 2012, SIAM J. Control. Optim..

[35]  P. Whittle A risk-sensitive maximum principle: the case of imperfect state observation , 1991 .

[36]  Charalambos D. Charalambous,et al.  Stochastic Uncertain Systems Subject to Relative Entropy Constraints: Induced Norms and Monotonicity Properties of Minimax Games , 2007, IEEE Transactions on Automatic Control.