Gradient Flows for Regularized Stochastic Control Problems

This work is motivated by a desire to extend the theoretical underpinning for the convergence of stochastic gradient type algorithms widely used in the reinforcement learning community to solve control problems. This paper studies stochastic control problems regularized by the relative entropy, where the action space is the space of measures. This setting includes relaxed control problems, problems of finding Markovian controls with the control function replaced by an idealized infinitely wide neural network and can be extended to the search for causal optimal transport maps. By exploiting the Pontryagin optimality principle, we construct gradient flow for the measure-valued control process along which the cost functional is guaranteed to decrease. It is shown that under appropriate conditions, this gradient flow has an invariant measure which is the optimal control for the regularized stochastic control problem. If the problem we work with is sufficiently convex, the gradient flow converges exponentially fast.

[1]  L. Young Lectures on the Calculus of Variations and Optimal Control Theory , 1980 .

[2]  B. Rozovskii,et al.  Stochastic evolution equations , 1981 .

[3]  Alain Bensoussan,et al.  Applications of Variational Inequalities in Stochastic Control , 1982 .

[4]  I. Gyöngy Mimicking the one-dimensional marginal distributions of processes having an ito differential , 1986 .

[5]  A. Bensoussan Stochastic Control of Partially Observable Systems , 1992 .

[6]  W. Fleming,et al.  Controlled Markov processes and viscosity solutions , 1992 .

[7]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[8]  T. Kurtz,et al.  Existence of Markov Controls and Characterization of Optimal Markov Controls , 1998 .

[9]  C. Villani,et al.  Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality , 2000 .

[10]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[11]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[12]  V. Borkar Controlled diffusion processes , 2005, math/0511077.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[15]  Xiongzhi Chen Brownian Motion and Stochastic Calculus , 2008 .

[16]  C. Villani Optimal Transport: Old and New , 2008 .

[17]  T. Komorowski,et al.  Central limit theorem for Markov processes with spectral gap in the Wasserstein metric , 2011, 1102.1842.

[18]  R. Lassalle Causal transference plans and their Monge-Kantorovich problems , 2013 .

[19]  Gerard P. Brunick,et al.  Mimicking an Itô process by a solution of a stochastic differential equation , 2010, 1011.0111.

[20]  Superlinear Drivers,et al.  7 – Backward Stochastic Differential Equations , 2011 .

[21]  Etienne Emmrich,et al.  Nonlinear stochastic evolution equations of second order with damping , 2015, 1512.09260.

[22]  Julio D. Backhoff Veraguas,et al.  Causal optimal transport and its links to enlargement of filtrations and continuous-time stochastic optimization , 2016, 1611.02610.

[23]  Mateusz B. Majka Coupling and exponential ergodicity for stochastic differential equations driven by Lévy processes , 2015, 1509.08816.

[24]  Daniel Lacker,et al.  Limit Theory for Controlled McKean-Vlasov Dynamics , 2016, SIAM J. Control. Optim..

[25]  Thaleia Zariphopoulou,et al.  Exploration versus Exploitation in Reinforcement Learning: A Stochastic Control Approach , 2018, SSRN Electronic Journal.

[26]  François Delarue,et al.  Probabilistic Theory of Mean Field Games with Applications I: Mean Field FBSDEs, Control, and Games , 2018 .

[27]  L. Szpruch,et al.  Mean-Field Neural ODEs via Relaxed Optimal Control , 2019, 1912.05475.

[28]  Mathieu Laurière,et al.  Convergence Analysis of Machine Learning Algorithms for the Numerical Solution of Mean Field Control and Games: I - The Ergodic Case , 2019, The Annals of Applied Probability.

[29]  Anna Kazeykina,et al.  Mean-field Langevin System, Optimal Control and Deep Neural Networks , 2019, ArXiv.

[30]  D. Lacker,et al.  Superposition and mimicking theorems for conditional McKean–Vlasov equations , 2020, Journal of the European Mathematical Society.

[31]  Zhenjie Ren,et al.  Mean-field Langevin dynamics and energy landscape of neural networks , 2019, Annales de l'Institut Henri Poincaré, Probabilités et Statistiques.

[32]  C. Reisinger,et al.  Regularity and stability of feedback relaxed controls , 2020, SIAM J. Control. Optim..