Acquiring Diverse Predictive Knowledge in Real Time by Temporal-difference Learning

Existing robot algorithms demonstrate several capabilities that are enabled by a robot’s knowledge of the temporallyextended consequences of its behaviour. This knowledge consists of real-time predictions—predictions that are conventionally computed by iterating a small one-timestep model of the robot’s dynamics. Given the utility of such predictions, alternatives are desirable when this conventional approach is not applicable, for example when an adequate model of the one-timestep dynamics is either not available or not computationally tractable. We describe how a robot can both learn and make many such predictions in real-time using a standard reinforcement learning algorithm. Our experiments show that a mobile robot can learn and make thousands of accurate predictions at 10 Hz about the future of all of its sensors and many internal state variables at multiple time-scales. The method uses a single set of features and learning parameters that are shared across all the predictions. We demonstrate the generality of these predictions with an application to a different platform, a robot arm operating at 50 Hz. Here, the predictions are about which arm joint the user wants to move next, a difficult situation to model analytically, and we show how the learned predictions enable measurable improvements to the user interface. The predictions learned in real-time by this method constitute a basic form of knowledge about the robot’s interaction with the environment, and extensions of this method can express more general forms of knowledge.

[1]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[2]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[3]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[4]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[6]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[7]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[8]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[9]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[10]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[11]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[12]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[13]  Doina Precup,et al.  Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[14]  Henry Y. K. Lau,et al.  Adaptive state space partitioning for reinforcement learning , 2004, Eng. Appl. Artif. Intell..

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[17]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[18]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[19]  Keith A. Bush An echo state model of non-markovian reinforcement learning , 2007 .

[20]  André da Motta Salles Barreto,et al.  Restricted gradient-descent algorithm for value-function approximation in reinforcement learning , 2008, Artif. Intell..

[21]  Fernando Fernández,et al.  Two steps reinforcement learning , 2008, Int. J. Intell. Syst..

[22]  Stephen Lin,et al.  Evolutionary Tile Coding: An Automated State Abstraction Algorithm for Reinforcement Learning , 2010, Abstraction, Reformulation, and Approximation.

[23]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[24]  Tom Schaul,et al.  Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[25]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[26]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[27]  Bart De Schutter,et al.  Approximate reinforcement learning: An overview , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[28]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[29]  Erik Talvitie,et al.  Learning to Make Predictions In Partially Observable Environments Without a Generative Model , 2011, J. Artif. Intell. Res..

[30]  Hans Kleine Büning,et al.  State Aggregation by Growing Neural Gas for Reinforcement Learning in Continuous State Spaces , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[31]  Byron Boots,et al.  An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.

[32]  R. Sutton,et al.  Gradient temporal-difference learning algorithms , 2011 .

[33]  R. S. Sutton,et al.  Dynamic switching and real-time machine learning for improved human control of assistive biomedical robots , 2012, 2012 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob).

[34]  Richard S. Sutton,et al.  Multi-timescale Nexting in a Reinforcement Learning Robot , 2012, SAB.

[35]  Farbod Fahimi,et al.  The Development of a Myoelectric Training Tool for Above-Elbow Amputees , 2012, The open biomedical engineering journal.