Acquiring a broad range of empirical knowledge in real time by temporal-difference learning

Several robot capabilities rely on predictions about the temporally extended consequences of a robot's behaviour. We describe how a robot can both learn and make many such predictions in real time using a standard algorithm. Our experiments show that a mobile robot can learn and make thousands of accurate predictions at 10 Hz. The predictions are about the future of all of the robot's sensors and many internal state variables at multiple time-scales. All the predictions share a single set of features and learning parameters. We demonstrate the generality of this method with an application to a different platform, a robot arm operating at 50 Hz. Here, learned predictions can be used to measurably improve the user interface. The temporally extended predictions learned in real time by this method constitute a basic form of knowledge about the dynamics of the robot's interaction with the environment. We also show how this method can be extended to express more general forms of knowledge.

[1]  Byron Boots,et al.  An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.

[2]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[3]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[4]  Richard S. Sutton,et al.  Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..

[5]  R. Sutton,et al.  Gradient temporal-difference learning algorithms , 2011 .

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[8]  Farbod Fahimi,et al.  The Development of a Myoelectric Training Tool for Above-Elbow Amputees , 2012, The open biomedical engineering journal.

[9]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[10]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[11]  Erik Talvitie,et al.  Learning to Make Predictions In Partially Observable Environments Without a Generative Model , 2011, J. Artif. Intell. Res..

[12]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[13]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[14]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[15]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[16]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[17]  R. S. Sutton,et al.  Dynamic switching and real-time machine learning for improved human control of assistive biomedical robots , 2012, 2012 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob).

[18]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..