Speeding up Q(lambda)-Learning

Q(λ)-learning uses TD(λ)-methods to accelerate Q-Learning. The worst case complexity for a single update step of previous online Q(λ) implementations based on lookup-tables is bounded by the size of the state/action space. Our faster algorithm's worst case complexity is bounded by the number of actions. The algorithm is based on the observation that Q-value updates may be postponed until they are needed.