Autonomous Learning of High-Level States and Actions in Continuous Environments

How can an agent bootstrap up from a low-level representation to autonomously learn high-level states and actions using only domain-general knowledge? In this paper, we assume that the learning agent has a set of continuous variables describing the environment. There exist methods for learning models of the environment, and there also exist methods for planning. However, for autonomous learning, these methods have been used almost exclusively in discrete environments. We propose attacking the problem of learning high-level states and actions in continuous environments by using a qualitative representation to bridge the gap between continuous and discrete variable representations. In this approach, the agent begins with a broad discretization and initially can only tell if the value of each variable is increasing, decreasing, or remaining steady. The agent then simultaneously learns a qualitative representation (discretization) and a set of predictive models of the environment. These models are converted into plans to perform actions. The agent then uses those learned actions to explore the environment. The method is evaluated using a simulated robot with realistic physics. The robot is sitting at a table that contains a block and other distractor objects that are out of reach. The agent autonomously explores the environment without being given a task. After learning, the agent is given various tasks to determine if it learned the necessary states and actions to complete them. The results show that the agent was able to use this method to autonomously learn to perform the tasks.

[1]  Brian Scassellati,et al.  Self-Taught Visually-Guided Pointing for a Humanoid Robot , 2006 .

[2]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects , 2006, NIPS.

[3]  Stefan Schaal,et al.  Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[4]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[5]  Gary L. Drescher,et al.  Made-up minds - a constructivist approach to artificial intelligence , 1991 .

[6]  Xiao Huang,et al.  Novelty and Reinforcement Learning in the Value System of Developmental Robots , 2002 .

[7]  Benjamin Kuipers,et al.  Learning Distinctions and Rules in a Continuous World through Active Exploration , 2007 .

[8]  Jonathan Klein,et al.  breve: a 3D environment for the simulation of decentralized systems and artificial life , 2002 .

[9]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[10]  Wei-Min Shen,et al.  Autonomous learning from the environment , 1994 .

[11]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[12]  Giorgio Metta,et al.  Better Vision through Manipulation , 2003, Adapt. Behav..

[13]  Giorgio Metta,et al.  Early integration of vision and manipulation , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[14]  Benjamin Kuipers,et al.  A comparison of strategies for developmental action acquisition in QLAP , 2009, EpiRob.

[15]  Brian Scassellati,et al.  Learning acceptable windows of contingency , 2006, Connect. Sci..

[16]  Leslie Pack Kaelbling,et al.  Learning Planning Rules in Noisy Stochastic Worlds , 2005, AAAI.

[17]  Pattie Maes,et al.  Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[18]  Christopher M. Vigorito,et al.  Autonomous Hierarchical Skill Acquisition in Factored MDPs , 2008 .

[19]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[20]  Michael L. Littman,et al.  Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[23]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[24]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[25]  Paul R. Cohen,et al.  Neo: learning conceptual knowledge by sensorimotor interaction with an environment , 1997, AGENTS '97.

[26]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[27]  Olivier Sigaud,et al.  Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.

[28]  Andrew G. Barto,et al.  Active Learning of Dynamic Bayesian Networks in Markov Decision Processes , 2007, SARA.

[29]  Andrew G. Barto,et al.  Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.

[30]  Qing Zhang,et al.  Markov Decision Problems , 2013 .

[31]  Benjamin Kuipers,et al.  Autonomous qualitative learning of distinctions and actions in a developing agent , 2010 .

[32]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[33]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[34]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[35]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[36]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[37]  Peter Struss,et al.  Qualitative Reasoning , 1997, The Computer Science and Engineering Handbook.

[38]  V. G. Payne,et al.  Human Motor Development: A Lifespan Approach , 1987 .

[39]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[40]  David G. Stork,et al.  Pattern Classification , 1973 .

[41]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[42]  Changhai Xu,et al.  Towards the Object Semantic Hierarchy , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[43]  Douglas S. Blank,et al.  An Emergent Framework For Self-Motivation In Developmental Robotics , 2004 .

[44]  Peter Stone,et al.  Generalized model learning for reinforcement learning in factored domains , 2009, AAMAS.

[45]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[46]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[47]  Benjamin Kuipers,et al.  Towards the Application of Reinforcement Learning to Undirected Developmental Learning , 2007 .

[48]  Anthony G. Cohn,et al.  Qualitative Reasoning , 1987, Advanced Topics in Artificial Intelligence.

[49]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[50]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Giulio Sandini,et al.  Learning about objects through action - initial steps towards artificial cognition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[52]  Andrea Bonarini,et al.  Incremental Skill Acquisition for Self-motivated Learning Animats , 2006, SAB.

[53]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[54]  B. Kuipers,et al.  Learning to predict the effects of actions: Synergy between rules and landmarks , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[55]  G. Miller,et al.  Plans and the structure of behavior , 1960 .

[56]  Benjamin Kuipers,et al.  Map Learning with Uninterpreted Sensors and Effectors , 1995, Artif. Intell..

[57]  A. Battersby Plans and the Structure of Behavior , 1968 .

[58]  A. Needham,et al.  A pick-me-up for infants’ exploratory skills: Early simulated experiences reaching for objects using ‘sticky mittens’ enhances young infants’ object exploration skills , 2002 .