A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs
暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[3] Yishay Mansour,et al. Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.
[4] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[5] Samuel W. Hasinoff,et al. Reinforcement Learning for Problems with Hidden State , 2003 .
[6] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[7] Shlomo Zilberstein,et al. Reinforcement Learning for Mixed Open-loop and Closed-loop Control , 1996, NIPS.