Predictive representations for policy gradient in POMDPs
暂无分享,去创建一个
[1] Leslie Pack Kaelbling,et al. Reinforcement Learning by Policy Search , 2002 .
[2] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[3] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[4] G. Casella,et al. Rao-Blackwellisation of sampling schemes , 1996 .
[5] Takaki Makino,et al. On-line discovery of temporal-difference networks , 2008, ICML '08.
[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[7] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.
[8] Eric Wiewiora,et al. Learning predictive representations from a history , 2005, ICML.
[9] Jonathan Baxter,et al. Scaling Internal-State Policy-Gradient Methods for POMDPs , 2002 .
[10] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[11] Olivier Buffet,et al. Policy-Gradients for PSRs and POMDPs , 2007, AISTATS.
[12] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[13] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.
[14] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[15] Christian R. Shelton,et al. Importance sampling for reinforcement learning with multiple objectives , 2001 .