An algebraic approach to abstraction in reinforcement learning

To operate effectively in complex environments learning agents ignore irrelevant details. Stated in general terms this is a very difficult problem. Much of the work in this field is specialized to specific modeling framework for Markov decision processes (MDPs) based on homomorphisms relating MDPs. We build on classical finite-state automata literature and develop a minimization framework for MDPs that can exploit structure and symmetries to derive smaller equivalent models of the problem. Since employing homomorphisms approximate and partial homomorphisms and develop bounds for the loss that Our MDP minimization results can be readily employed by reinforcement approach to hierarchical RL, specifically using the options framework. We introduce relativized options, a generalization of Markov sub-goal options, that allow us to define options without an absolute frame of reference. We introduce an extension to the options framework, based on relativized options, that allows us to learn simultaneously at multiple levels of the hierarchy guarantees regarding the performance of hierarchical systems that employ approximate in several test-beds. Relativized options can also be interpreted as behavioral schemas. We demonstrate that such schemas can be profitably employed in a hierarchical RL setting. We also develop algorithms that learn the appropriate parameter binding to a given schema. We empirically demonstrate the validity and utility of these algorithms. Relativized options allow us to model certain aspects of deictic or indexical representations. We develop a modification of our parameter binding algorithm suited to hierarchical RL architectures that employ deictic representations.

[1]  J. Piaget The construction of reality in the child , 1954 .

[2]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[4]  J. Hartmanis,et al.  Algebraic Structure Theory Of Sequential Machines , 1966 .

[5]  E. Denardo CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[6]  Selby H. Evans,et al.  A brief statement of schema theory , 1967 .

[7]  Saul Amarel,et al.  On representations of problems of reasoning about actions , 1968 .

[8]  J. Robert Jump,et al.  A Note on the Iterative Decomposition of Finite Automata , 1969, Inf. Control..

[9]  J. Piaget,et al.  The Origins of Intelligence in Children , 1971 .

[10]  Azaria Paz,et al.  Introduction to Probabilistic Automata , 1971 .

[11]  J. K. Satia,et al.  Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..

[12]  R. Schmidt A schema theory of discrete motor skill learning. , 1975 .

[13]  Ward Whitt,et al.  Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..

[14]  W. Klein,et al.  Speech, place, and action : studies in deixis and related topics , 1982 .

[15]  S. Ullman Visual routines , 1984, Cognition.

[16]  Robin Milner,et al.  Algebraic laws for nondeterminism and concurrency , 1985, JACM.

[17]  Michael A. Arbib,et al.  Schema theory , 1998 .

[18]  Chelsea C. White,et al.  Parameter Imprecision in Finite State, Finite Action Dynamic Programs , 1986, Oper. Res..

[19]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[20]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[21]  Philip E. Agre,et al.  The dynamic structure of everyday life , 1988 .

[22]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[23]  Michael A. Arbib,et al.  A formal model of computation for sensory-based robotics , 1989, IEEE Trans. Robotics Autom..

[24]  Craig A. Knoblock Learning Abstraction Hierarchies for Problem Solving , 1990, AAAI.

[25]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[26]  J. Glover Symmetry Groups and Translation Invariant Representations of Markov Processes , 1991 .

[27]  Kim G. Larsen,et al.  Bisimulation through Probabilistic Testing , 1991, Inf. Comput..

[28]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[29]  Hilary Buxton,et al.  Selective Attention in Dynamic Vision , 1993, IJCAI.

[30]  David L. Dill,et al.  Better verification through symmetry , 1996, Formal Methods Syst. Des..

[31]  Craig Boutilier,et al.  Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.

[32]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[33]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[34]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[35]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[36]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[37]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[38]  Long Ji Lin,et al.  Reinforcement Learning of Non-Markov Decision Processes , 1995, Artif. Intell..

[39]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[40]  Pattie Maes,et al.  Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[41]  Sérgio Vale Aguiar Campos,et al.  Symbolic Model Checking , 1993, CAV.

[42]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[43]  A. Prasad Sistla,et al.  Symmetry and model checking , 1996, Formal Methods Syst. Des..

[44]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[45]  Robert Givan,et al.  Model Minimization, Regression, and Propositional STRIPS Planning , 1997, IJCAI.

[46]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[47]  Michael E. Cleary,et al.  Systematic use of deictic commands for mobile robot navigation , 1997 .

[48]  A. Prasad Sistla,et al.  Utilizing symmetry when model-checking under fairness assumptions: an automata-theoretic approach , 1997, TOPL.

[49]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[50]  J. A. Coelho,et al.  A Control Basis for Learning Multifingered Grasps , 1997 .

[51]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[52]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[53]  Somesh Jha,et al.  Combining Partial Order and Symmetry Reductions , 1997, TACAS.

[54]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[55]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[56]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[57]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[58]  Chris Drummond,et al.  Composing Functions to Speed up Reinforcement Learning in a Changing World , 1998, ECML.

[59]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[60]  E. Allen Emerson,et al.  Model Checking Real-Time Properties of Symmetric Systems , 1998, MFCS.

[61]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[62]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[63]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[64]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[65]  Roderic A. Grupen,et al.  A Hybrid Architecture for Learning Robot Control Tasks , 1999 .

[66]  E. Allen Emerson,et al.  From Asymmetry to Full Symmetry: New Techniques for Symmetry Reduction in Model Checking , 1999, CHARME.

[67]  Thomas G. Dietterich An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.

[68]  Daphne Koller,et al.  Policy Iteration for Factored MDPs , 2000, UAI.

[69]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[70]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[71]  Roderic A. Grupen,et al.  Symmetries in World Geometry and Adaptive System Behaviour , 2000, AFPAC.

[72]  Sridhar Mahadevan,et al.  Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.

[73]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[74]  Andrew G. Barto,et al.  Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.

[75]  David Andre,et al.  Programmable Reinforcement Learning Agents , 2000, NIPS.

[76]  Roderic A. Grupen,et al.  A hybrid architecture for adaptive robot control , 2000 .

[77]  Balaraman Ravindran,et al.  Symmetries and Model Minimization in Markov Decision Processes , 2001 .

[78]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[79]  Tucker R. Balch,et al.  Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning , 2001, ICML.

[80]  Sridhar Mahadevan,et al.  A reinforcement learning model of selective visual attention , 2001, AGENTS '01.

[81]  Kee-Eung Kim,et al.  Solving Factored MDPs via Non-Homogeneous Partitioning , 2001, IJCAI.

[82]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[83]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[84]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[85]  Tim Oates,et al.  The Thing that we Tried Didn't Work very Well: Deictic Representation in Reinforcement Learning , 2002, UAI.

[86]  Andrew G. Barto,et al.  PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[87]  Balaraman Ravindran,et al.  Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.

[88]  Balaraman Ravindran,et al.  SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.

[89]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[90]  Robert Platt,et al.  Extending fingertip grasping to whole body grasping , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[91]  Shlomo Zilberstein,et al.  Symbolic Generalization for On-line Planning , 2002, UAI.

[92]  Robert Platt,et al.  Manipulation gaits: sequences of grasp control tasks , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[93]  Glenn A. Iba,et al.  A Heuristic Approach to the Discovery of Macro-Operators , 1989, Machine Learning.

[94]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[95]  Prashant Shenoy,et al.  Active QoS Flow Maintenance in Robotic , Mobile , Ad Hoc Networks , 2004 .

[96]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[97]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[98]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[99]  B. Nordstrom FINITE MARKOV CHAINS , 2005 .