The Art of Hybrid Systems

ly speaking the final endpoint constraint problems correspond to constraints on the space of admissible controls. That is to say the effect of the constraint (8.3) is to restrict the admissible set of u(t) to belong to the set U(§(%),%) which results in ψ(x(tf ), tf) = 0. The optimal control problem is to find (if possible) an admissible control u∗(·) such that J(x0, u ∗(·), t0) ≤ J(x0, u(·), t0) (8.5) The admissible control set may need to be U(x(t), t) if there is a final endpoint constraint. In what follows, we will give two different approaches to solving the optimal control problem. The first and more classical approach is known as the calculus of variations of approach. The second approach is the dynamic programming approach which leads to the Hamilton Jacobi Bellman equation which we will use extensively in the Chapter 9. 8.2 The Calculus of Variations Approach We will perform the calculus of variations on the cost function of (8.4) subject to the constraints of equations (8.1), (8.3). The first step is to convert the constrained problem to an unconstrained problem using Lagrange multiplies λ ∈ R for the end point constraint (8.3) and p(t) ∈ R for the constraints imposed by the differential equation (8.1) to define the modified cost function: J = φ(x(tf ), tf) + λ ψ(x(tf ), tf ) + ∫ tf t0 [ L(x, u, t) + p (f(x, u, t)− ẋ) ] dt (8.6) Inspired by classical mechanics, we define what will be a useful quantity in the calculation to follow called a Hamiltonian H(x, p, u, t) using what is referred to as a Legendre transformation (details of the connection between 162 CHAPTER 8. OPTIMAL CONTROL AND GAMES optimal control and classical mechanics will be made later in this section): H(x, p, u, t) = L(x, u, t) + pf(x, u, t) (8.7) The variation of (8.6) is derived by assuming that a variation by δu(·) in the admissible control space, with attendant variations δx(·), δp(·), δλ so that δJ = (D1φ+ λ D1ψ)δx|tf + (D2φ+ λD2ψ)δt|tf + ψ δλ + (H − p ẋ)δt|tf + ∫ tf t0 [ D1Hδx+D3Hδu− p δẋ+ (D2H − ẋ) δp ] dt (8.8) The notation DiH stands for the derivative of H with respect to the i th argument. Thus, for example, D3H(x, p, u, t) = ∂H ∂u ∈ R1×ni D1H(x, p, u, t) = ∂H ∂x ∈ R1×n Integrating by parts for ∫ p (t)δẋ(t)dt yields δJ = (D1φ+D1ψ λ− p )δx(tf ) + (D2φ+D2ψλ+H)δT + ψdλ + ∫ tf t0 [ (D1H + ṗ T )δx+D3Hδu+ (D T 2 − ẋ) δp ] dt (8.9) An extremum of J is achieved when δJ = 0 for all independent variations δλ, δx, δu, δp. By the artifact of having used Lagrange multipliers, all of these variations are in fact independent. Thus, we simply read off the condition that each of these variations is zero at each time instant t. These conditions are recorded in the following table. The left most column is the name of the appropriate stationarity condition. Table of necessary conditions for optimality 8.2. THE CALCULUS OF VARIATIONS APPROACH 163 Description Equation Variation Final State constraint ψ(x(T ), T ) = 0 δλ State Equation ẋ = ∂H ∂p T δp(t) Costate equation ṗ = − ∂x T δx(t) Input stationarity ∂H ∂u = 0 δu(t) Transversality conditions D1φ− p = −λD1ψ|tf δx(tf ) H +D2φ = −λD2ψ|tf δtf The conditions of Table (8.2) and the boundary conditions x(t0) = x0 and the constraint on the final state ψ(x(tf ), tf) = 0 constitute the necessary conditions for optimality. The end point constraint equation is referred to as the transversality condition: D1φ(x(tf ), tf)− p (tf) = −λ (tf )D1ψ(tf) H(x(tf ), p(tf), u(tf), tf ) +D2φ(x(tf ), tf) = −λ (tf )D2ψ(x(tf ), tf) (8.10) The optimality conditions may be written explicitly as ẋ = ∂H ∂p T (x, p, u∗) ṗ = − ∂x T (x, p, u∗) (8.11) with the input u∗((t) at time t chosen to satisfy the stationarity condition: ∂H ∂u (x(t), p(t), u∗(t)) = 0 and the endpoint constraint ψ(x(tf ), tf) = 0. The key point to the derivation of the necessary conditions of optimality is that the Legendre transformation of the Lagrangian to be minimized into a Hamiltonian converts a functional minimization problem into a static optimization problem on the function H(x, p, u, t). The question of when these equations also constitute sufficient conditions for (local) optimality is an important one and needs to be ascertained by taking the second variation of J . This is an involved procedure but the input stationarity condition in Table (8.2) hints at the sufficient condition 164 CHAPTER 8. OPTIMAL CONTROL AND GAMES for local minimality of a given trajectory x∗(·), u∗(·), p∗(·) being a local minimum as being that the Hessian of the Hamiltonian, D 3H(x ∗, p∗, u∗, t) (8.12) being positive definite along the optimal trajectory. A sufficient condition for this is to ask simply that the ni × ni Hessian matrix D 3H(x(t), p(t), u ∗(t), t) (8.13) be positive definite. As far as conditions for global minimality are concerned, again the stationarity condition hints at a sufficient condition for global minimality being that u∗(t) = argmin { min over u } H(x∗(t), p∗(t), u∗(t), t) (8.14) The formal proof that the conditions of equation (8.14) give the global optimum were first derived by Pontryagin, Gamkrelidze, Boltyanskii and Miscenko ([?]) and this is referred to as Pontraygin’s Minimum (sometimes Maximum) Principle. The proof of the Pontryagin Principle being a necessary condition for a global minimum is beyond the scope of this book. The key point to this Principle is the requirement that the optimization of (8.14) be performed globally, Generally speaking, there are not many good algorithms for global optimization except for special classes of functions or by various randomization techniques. Special classes of functions for which there are good sufficient conditions are, for example, those for which the Hamiltonian H(x, p, u, t) is convex in u for all x, p, t. In this case there is a unique local and global minimum of H(x, p, ·, t). Finally, there are instances in which the Hamiltonian H(x, p, u, t) is not a function of u at some values of x, p, t. These cases are referred to as singular extremals and need to be treated with care, since the value of u is left unspecified as far as the optimization is concerned. 8.2.1 Fixed Endpoint Time Problems In the instance that the final time tf is fixed, the equations take on a simpler form, since there is no variation in δtf . Then, the boundary condition of equation (8.10) becomes p (tf ) = D1φ(x(tf ), tf) +D1ψ T (x(tf ), tf)λ. (8.15) 8.2. THE CALCULUS OF VARIATIONS APPROACH 165 Further, if there is no final state constraint the boundary condition simplifies even further to p(tf) = D1φ T (x(tf ), tf ). (8.16) 8.2.2 Time Invariant Systems In the instance that f(x, u, t) and the running cost L(x, u, t) are not explicitly functions of time, there is no final state constraint and the final time tf is fixed, the formulas of the Table of necessary conditions for optimality can be rewritten as: State Equation ẋ = ∂H ∂p T = f(x, u∗) Costate Equation ṗ = − ∂x T = −D1fp+D1L Stationarity Condition 0 = ∂H ∂u = D2L T +D2f p Transversality Conditions D1φ− p = −D1ψλ H(tf) = 0 In addition, it may be verified that H(t) ≡ 0 by checking that dH∗ dt = ∂H∗ ∂x (x, p)ẋ+ ∂H∗ ∂p ṗ = 0. (8.17) 8.2.3 Connections with Classical Mechanics Hamilton’s principle of least action states (under certain conditions ) that a conservative system moves so as to minimize the time integral of its “action”, defined to be the difference between the kinetic and potential energy. To make this more explicit we define q ∈ R to be the vector of generalized coordinates of the system and denote by U(q) the potential energy of the For example, there is no dissipation or no nonholonomic constraints. Holonomic or integrable constraints are dealt with by adding appropriate Lagrange multipliers. If nonholonomic constraints are dealt with in the same manner, we get equations of motion, dubbed vakonomic by Arnold [8] which do not correspond to experimentally observed motions. On the other hand, if there are only holonomic constraints, the equations of motion that we derive from Hamilton’s principle of least action is equivalent to Newton’s laws. 166 CHAPTER 8. OPTIMAL CONTROL AND GAMES system and T (q, q) the kinetic energy of the system. Then Hamilton’s principle of least action states that the trajectory of the classical system satisfies the solution of the optimal control problem for the system