Performance bounds for λ policy iteration and application to the game of Tetris
暂无分享,去创建一个
We consider the discrete-time infinite-horizon optimal control problem formalized by Markov decision processes (Puterman, 1994; Bertsekas and Tsitsiklis, 1996). We revisit the work of Bertsekas and...