Concurrent Hierarchical Reinforcement Learning

We consider applying hierarchical reinforcement learning techniques to problems in which an agent has several effectors to control simultaneously. We argue that the kind of prior knowledge one typically has about such problems is best expressed using a multithreaded partial program, and present concurrent ALisp, a language for specifying such partial programs. We describe algorithms for learning and acting with concurrent ALisp that can be efficient even when there are exponentially many joint choices at each decision point. Finally, we show results of applying these methods to a complex computer game domain.

[1]  L. J. Savage,et al.  The Foundations of Statistics , 1955 .

[2]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[3]  Austin Tate,et al.  Generating Project Networks , 1977, IJCAI.

[4]  P. Varaiya,et al.  Multilayer control of large Markov chains , 1978 .

[5]  J. Lumley AUSTRALIA , 1920, The Lancet.

[6]  John McCarthy,et al.  SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[7]  Guy L. Steele,et al.  Common Lisp the Language , 1984 .

[8]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[9]  Editors , 1986, Brain Research Bulletin.

[10]  Tod S. Levitt,et al.  Uncertainty in artificial intelligence , 1988 .

[11]  Richard P. Lippmann,et al.  Proceedings of the 1997 conference on Advances in neural information processing systems 10 , 1990 .

[12]  R. Durrett Probability: Theory and Examples , 1993 .

[13]  Glynn Winskel,et al.  The formal semantics of programming languages - an introduction , 1993, Foundation of computing series.

[14]  Robert L. Grossman,et al.  Timed Automata , 1999, CAV.

[15]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[16]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[17]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[18]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[19]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[20]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[21]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[22]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[23]  David Andre,et al.  Generalized Prioritized Sweeping , 1997, NIPS.

[24]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[25]  Ronen I. Brafman,et al.  Planning with Concurrent Interacting Actions , 1997, AAAI/IAAI.

[26]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[27]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[28]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[29]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[30]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[31]  Sridhar Mahadevan,et al.  Optimizing Production Manufacturing Using Reinforcement Learning , 1998, FLAIRS.

[32]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[33]  Rina Dechter,et al.  Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[34]  Hiroaki Kitano,et al.  RoboCup Rescue: search and rescue in large-scale disasters as a domain for autonomous agents research , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[35]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[36]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[37]  Craig Boutilier,et al.  Decision-Theoretic, High-Level Agent Programming in the Situation Calculus , 2000, AAAI/IAAI.

[38]  Hector J. Levesque,et al.  ConGolog, a concurrent programming language based on the situation calculus , 2000, Artif. Intell..

[39]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[40]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[41]  Sridhar Mahadevan,et al.  Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.

[42]  David Andre,et al.  Programmable Reinforcement Learning Agents , 2000, NIPS.

[43]  Michail G. Lagoudakis,et al.  Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[44]  Mark S. Fox,et al.  Agent-Oriented Supply-Chain Management , 2000 .

[45]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[46]  Sridhar Mahadevan,et al.  Hierarchically Optimal Average Reward Reinforcement Learning , 2002, ICML.

[47]  Alex M. Andrew,et al.  Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2002 .

[48]  Michael I. Jordan,et al.  A Minimal Intervention Principle for Coordinated Movement , 2002, NIPS.

[49]  Sridhar Mahadevan,et al.  Learning to Take Concurrent Actions , 2002, NIPS.

[50]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[51]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[52]  A. Barto,et al.  LEARNING AND APPROXIMATE DYNAMIC PROGRAMMING Scaling Up to the Real World , 2003 .

[53]  Sridhar Mahadevan,et al.  Hierarchical Policy Gradient Algorithms , 2003, ICML.

[54]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[55]  Stuart J. Russell,et al.  Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[56]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[57]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[58]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[59]  Allen Newell,et al.  Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.

[60]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[61]  Jennie Si,et al.  Hierarchical Approaches to Concurrency, Multiagency, and Partial Observability , 2004 .

[62]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[63]  Emanuel Todorov,et al.  From task parameters to motor synergies: A hierarchical framework for approximately optimal control of redundant manipulators , 2005 .

[64]  Hector Muñoz-Avila,et al.  Applications of SHOP and SHOP2 , 2005, IEEE Intelligent Systems.

[65]  Stuart J. Russell,et al.  A compact, hierarchically optimal Q-function decomposition , 2006, UAI 2006.

[66]  Richard Fikes,et al.  Design and Implementation of the CALO Query Manager , 2006, AAAI.

[67]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[68]  Fernando Paganini,et al.  IEEE Transactions on Automatic Control , 2006 .

[69]  John A. Buzacott International Journal of Flexible Manufacturing Systems: an appreciation , 2007 .