We investigate new approaches to dynamic-programming-based optimal control of continuous time-and-space systems. We use neural networks to approximate the solution to the Hamilton-Jacobi-Bellman (HJB) equation which is a first-order, nonlinear, partial differential equation. We derive the gradient descent rule for integrating this equation inside the domain, given the conditions on the boundary. We apply this approach to the "car-on-the-hill" which is a 2D highly nonlinear control problem. We discuss the results obtained and point out a low quality of approximation of the value function and of the derived control. We attribute this bad approximation to the fact that the HJB equation has many generalized solutions other than the value function, and our gradient descent method converges to one among these functions, thus possibly failing to find the correct value function. We illustrate this limitation on a simple 1D control problem.
[1]
Harold J. Kushner,et al.
wchastic. approximation methods for constrained and unconstrained systems
,
1978
.
[2]
A. Moore.
Variable Resolution Dynamic Programming
,
1991,
ML.
[3]
W. Fleming,et al.
Controlled Markov processes and viscosity solutions
,
1992
.
[4]
P. Lions,et al.
User’s guide to viscosity solutions of second order partial differential equations
,
1992,
math/9207212.
[5]
M. James.
Controlled markov processes and viscosity solutions
,
1994
.
[6]
Leemon C. Baird,et al.
Residual Algorithms: Reinforcement Learning with Function Approximation
,
1995,
ICML.
[7]
Andrew W. Moore,et al.
Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems
,
1999,
IJCAI.