Regularity and stability of feedback relaxed controls

This paper proposes a relaxed control regularization with general exploration rewards to design robust feedback controls for multi-dimensional continuous-time stochastic exit time problems. We establish that the regularized control problem admits a H\"{o}lder continuous feedback control, and demonstrate that both the value function and the feedback control of the regularized control problem are Lipschitz stable with respect to parameter perturbations. Moreover, we show that a pre-computed feedback relaxed control has a robust performance in a perturbed system, and derive a first-order sensitivity equation for both the value function and optimal feedback relaxed control. We finally prove first-order monotone convergence of the value functions for relaxed control problems with vanishing exploration parameters, which subsequently enables us to construct the pure exploitation strategy of the original control problem based on the feedback relaxed controls.

[1]  Pavel Drábek,et al.  Continuity of Nemyckij's operator in Hölder spaces , 1975 .

[2]  D. Gilbarg,et al.  Elliptic Partial Differential Equa-tions of Second Order , 1977 .

[3]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[4]  N. Krylov Controlled Diffusion Processes , 1980 .

[5]  Israel Zang,et al.  A smoothing-out technique for min—max optimization , 1980, Math. Program..

[6]  Hans-Joachim Langen,et al.  Convergence of Dynamic Programming Models , 1981, Math. Oper. Res..

[7]  P. Lions,et al.  Optimal control of stochastic integrals and Hamilton Jacobi-Bellman equations , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[8]  P. Lions,et al.  Viscosity solutions of fully nonlinear second-order elliptic partial differential equations , 1990 .

[9]  W. Fleming,et al.  Controlled Markov processes and viscosity solutions , 1992 .

[10]  Rita Nugari,et al.  Further remarks on the Nemitskii operator in Holder spaces , 1993 .

[11]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[12]  R. Rockafellar,et al.  Proto-derivative formulas for basic subgradient mappings in mathematical programming , 1994 .

[13]  Olvi L. Mangasarian,et al.  Smoothing methods for convex inequalities and linear complementarity problems , 1995, Math. Program..

[14]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15]  G. Barles,et al.  A STRONG COMPARISON RESULT FOR THE BELLMAN EQUATION ARISING IN STOCHASTIC EXIT TIME CONTROL PROBLEMS AND ITS APPLICATIONS , 1998 .

[16]  Ya-Zhe Chen,et al.  Second Order Elliptic Equations and Elliptic Systems , 1998 .

[17]  Jinlin Peng,et al.  A Smoothing Function and Its Applications , 1998 .

[18]  Ji-Ming Peng,et al.  A non-interior continuation method for generalized linear complementarity problems , 1999, Math. Program..

[19]  X. Zhou,et al.  Stochastic Controls: Hamiltonian Systems and HJB Equations , 1999 .

[20]  Sébastien Chaumont Uniqueness to elliptic and parabolic Hamilton-Jacobi-Bellman equations with non-smooth boundary , 2004 .

[21]  J. B. G. Frenk,et al.  Recursive Approximation of the High Dimensional Max Function , 2003, Oper. Res. Lett..

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  P. Forsyth,et al.  Numerical methods for controlled Hamilton-Jacobi-Bellman PDEs in finance , 2007 .

[24]  Dawn Hunter Numerical methods for controlled Hamilton-Jacobi-Bellman PDEs in finance , 2007 .

[25]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[26]  Hasnaa Zidani,et al.  Some Convergence Results for Howard's Algorithm , 2009, SIAM J. Numer. Anal..

[27]  P. G. Ciarlet,et al.  Linear and Nonlinear Functional Analysis with Applications , 2013 .

[28]  Endre Süli,et al.  Discontinuous Galerkin Finite Element Approximation of Hamilton-Jacobi-Bellman Equations with Cordes Coefficients , 2014, SIAM J. Numer. Anal..

[29]  Rainer Buckdahn,et al.  Generalized Hamilton-Jacobi-Bellman Equations with Dirichlet Boundary Condition and Stochastic Exit Time Optimal Control Problem , 2014, SIAM J. Control. Optim..

[30]  Endre Süli,et al.  Discontinuous Galerkin finite element methods for time-dependent Hamilton–Jacobi–Bellman equations with Cordes coefficients , 2014, Numerische Mathematik.

[31]  A. Veretennikov,et al.  Existence and uniqueness theorems for solutions of McKean–Vlasov stochastic equations , 2016 .

[32]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[33]  Dale Schuurmans,et al.  Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.

[34]  Benjamin Recht,et al.  Certainty Equivalence is Efficient for Linear Quadratic Control , 2019, NeurIPS.

[35]  Thaleia Zariphopoulou,et al.  Exploration versus Exploitation in Reinforcement Learning: A Stochastic Control Approach , 2018, SSRN Electronic Journal.

[36]  Renyuan Xu,et al.  Dynamic Programming Principles for Learning MFCs , 2019 .

[37]  Hisayuki Tsukuma,et al.  Matrix Algebra , 2018, Invitation to Linear Programming and Game Theory.

[38]  Yan Dolinsky,et al.  Extended weak convergence and utility maximisation with proportional transaction costs , 2019, Finance and Stochastics.

[39]  Matthieu Geist,et al.  A Theory of Regularized Markov Decision Processes , 2019, ICML.

[40]  Nicholas J. Higham,et al.  Accurate Computation of the Log-Sum-Exp and Softmax Functions , 2019, ArXiv.

[41]  M. Beiglböck,et al.  All adapted topologies are equal , 2019, Probability Theory and Related Fields.

[42]  Y. Dolinsky,et al.  Continuity of utility maximization under weak convergence , 2020 .

[43]  Xin Guo,et al.  Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework , 2020 .

[44]  Xun Yu Zhou,et al.  Continuous‐time mean–variance portfolio selection: A reinforcement learning framework , 2020 .

[45]  Ali Devran Kara,et al.  Robustness to Incorrect System Models in Stochastic Control , 2018, SIAM J. Control. Optim..

[46]  Renyuan Xu,et al.  A General Framework for Learning Mean-Field Games , 2020, Mathematics of Operations Research.

[47]  Kazufumi Ito,et al.  A neural network based policy iteration algorithm with global H2-superlinear convergence for stochastic games on domains , 2019, Found. Comput. Math..

[48]  Estimating processes in adapted Wasserstein distance , 2020, The Annals of Applied Probability.