Evolutionary Function Approximation

The methods presented in Chapter 3 allow the representation-learning capacity of evolutionary algorithms like NEAT to be harnessed in both off-line and on-line scenarios. However, that capacity is still limited in scope to policy search methods. Hence, Sutton and Barto’s criticism (that policy search methods, unlike temporal difference methods, do not exploit the specific structure of the reinforcement learning problem) still applies. To address this problem, we need methods that can optimize representations, not just for policies, but value function approximators trained with temporal difference methods.