Tree based hierarchical reinforcement learning

Abstract : In this thesis, the author investigates methods for speeding up automatic control algorithms. Specifically, he provides new abstraction techniques for Reinforcement Learning and Semi-Markov Decision Processes (SMDPs). He also introduces the use of policies as temporally abstract actions. This is different from previous definitions of temporally abstract actions as he does not have termination criteria. He provides an approach for processing previously solved problems to extract these policies. He also contributes a method for using supplied or extracted policies to guide and speed up the solving of new problems. He treats extracting policies as a supervised learning task and introduces the Lumberjack algorithm, which extracts repeated sub-structure within a decision tree. He then introduces the TTree algorithm, which combines state and temporal abstraction to increase problem solving speed on new problems. TTree solves SMDPs by using both user- and machine-supplied policies as temporally abstract actions while generating its own tree-based abstract state representation. By combining state and temporal abstraction in this way, TTree is the only known SMDP algorithm that is able to ignore irrelevant or harmful subregions within a supplied abstract action while still making use of other parts of the abstract action.

[1]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[2]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[3]  James S. Albus,et al.  Brains, behavior, and robotics , 1981 .

[4]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[5]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[6]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[7]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[8]  W. T. Miller,et al.  CMAC: an associative neural network alternative to backpropagation , 1990, Proc. IEEE.

[9]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[10]  Hyongsuk Kim,et al.  CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.

[11]  Satinder P. Singh,et al.  Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.

[12]  Carla E. Brodley,et al.  Linear Machine Decision Trees , 1991 .

[13]  Craig A. Knoblock Automatically generating abstractions for problem solving , 1991 .

[14]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[15]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[16]  Randal E. Bryant,et al.  Symbolic Boolean manipulation with ordered binary-decision diagrams , 1992, CSUR.

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[19]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[20]  C. Atkeson,et al.  Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[21]  Ronald J. Williams,et al.  Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[22]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[23]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[24]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[25]  Michael J. Pazzani,et al.  Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction , 1993, J. Artif. Intell. Res..

[26]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[27]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[28]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[29]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[30]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[31]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[32]  Geoffrey J. Gordon Online Fitted Reinforcement Learning , 1995 .

[33]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[34]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[35]  Pattie Maes,et al.  Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[36]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[37]  Andrew W. Moore,et al.  Learning Evaluation Functions for Large Acyclic Domains , 1996, ICML.

[38]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[39]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[40]  Craig G. Nevill-Manning,et al.  Inferring Sequential Structure , 1996 .

[41]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[42]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[43]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[44]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[45]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[46]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[47]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[48]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[49]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[50]  Andrew W. Moore,et al.  Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.

[51]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[52]  Andrew W. Moore,et al.  Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs , 1999, IJCAI.

[53]  Manuela Veloso,et al.  An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[54]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[55]  Bernhard Hengst,et al.  Generating Hierarchical Structure in Reinforcement Learning from State Variables , 2000, PRICAI.

[56]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[57]  Jesse Hoey,et al.  APRICODD: Approximate Policy Construction Using Decision Diagrams , 2000, NIPS.

[58]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[59]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[60]  Andrew W. Moore,et al.  Direct Policy Search using Paired Statistical Tests , 2001, ICML.

[61]  Stabilizing Value Function Approximation with the BFBP Algorithm , 2001, NIPS.

[62]  Andrew G. Barto,et al.  Autonomous discovery of temporal abstractions from interaction with an environment , 2002 .

[63]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[64]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[65]  C. S. Wallace,et al.  Coding Decision Trees , 1993, Machine Learning.

[66]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[67]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[68]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.