Stochastic Dynamic Programming

Formally, a stochastic dynamic program has the same components as a deter-ministic one; the only modification is to the state transition equation. When events in the future are uncertain, the state does not evolve deterministically; instead, states and actions today lead to a distribution over possible states in the future. We'll assume that P (B, x, a) is the induced probability that tomor-row's state will lie in the set B given that today's state is x and today's action is a. Let's take a look at the Bellman equation for a stochastic dynamic program: v (x) = max a∈Γ(x) r (x, a) + X v (x ′) P (dx ′ , x, a). We have to relate this expression to V (x 0) = max {xt,at} ∞ t=0 E 0 ∞ t=0 β t r (x t , a t) , the value function defined over the sequential problem. It turns out to be a bit more work than it was before because we need to be clear how to construct expectations. There are three important concepts buried in here that we need – conditional expectations, expectations over sequences of events that unfold over time, and the law of iterated expectations. To understand these concepts formally we need to take a detour into measure theory.