Scaling life-long off-policy learning
暂无分享,去创建一个
[1] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[2] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.
[3] John Langford,et al. Parallel Online Learning , 2011, ArXiv.
[4] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[5] Erik Talvitie,et al. Learning to Make Predictions In Partially Observable Environments Without a Generative Model , 2011, J. Artif. Intell. Res..
[6] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[7] R. Sutton,et al. Off-policy Learning with Recognizers , 2000 .
[8] Alexander J. Smola,et al. Multitask Learning without Label Correspondences , 2010, NIPS.
[9] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[10] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.
[11] Byron Boots,et al. An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.
[12] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.
[13] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..
[14] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.
[15] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[18] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.
[19] R. S. Sutton,et al. Dynamic switching and real-time machine learning for improved human control of assistive biomedical robots , 2012, 2012 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob).
[20] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[21] Paul Newman,et al. Highly scalable appearance-only SLAM - FAB-MAP 2.0 , 2009, Robotics: Science and Systems.
[22] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[23] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[24] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[25] Sebastian Thrun,et al. Lifelong robot learning , 1993, Robotics Auton. Syst..
[26] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[27] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.