Tree based hierarchical reinforcement learning
暂无分享,去创建一个
[1] C. S. Wallace,et al. An Information Measure for Classification , 1968, Comput. J..
[2] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.
[3] James S. Albus,et al. Brains, behavior, and robotics , 1981 .
[4] James A. Storer,et al. Data compression via textual substitution , 1982, JACM.
[5] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .
[6] Ronald L. Rivest,et al. Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..
[7] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .
[8] W. T. Miller,et al. CMAC: an associative neural network alternative to backpropagation , 1990, Proc. IEEE.
[9] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[10] Hyongsuk Kim,et al. CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.
[11] Satinder P. Singh,et al. Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.
[12] Carla E. Brodley,et al. Linear Machine Decision Trees , 1991 .
[13] Craig A. Knoblock. Automatically generating abstractions for problem solving , 1991 .
[14] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.
[15] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[16] Randal E. Bryant,et al. Symbolic Boolean manipulation with ordered binary-decision diagrams , 1992, CSUR.
[17] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .
[18] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
[19] William H. Press,et al. The Art of Scientific Computing Second Edition , 1998 .
[20] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .
[21] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[22] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.
[23] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[24] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[25] Michael J. Pazzani,et al. Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction , 1993, J. Artif. Intell. Res..
[26] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[27] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .
[28] Simon Kasif,et al. A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..
[29] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.
[30] Ron Kohavi,et al. Wrappers for performance enhancement and oblivious decision graphs , 1995 .
[31] Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.
[32] Geoffrey J. Gordon. Online Fitted Reinforcement Learning , 1995 .
[33] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[34] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[35] Pattie Maes,et al. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .
[36] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[37] Andrew W. Moore,et al. Learning Evaluation Functions for Large Acyclic Domains , 1996, ICML.
[38] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[39] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[40] Craig G. Nevill-Manning,et al. Inferring Sequential Structure , 1996 .
[41] Ian H. Witten,et al. Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..
[42] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[43] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[44] Manuela M. Veloso,et al. Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.
[45] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[46] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[47] Bruce L. Digney,et al. Learning hierarchical control structures for multiple tasks and changing environments , 1998 .
[48] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .
[49] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.
[50] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.
[51] Jesse Hoey,et al. SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.
[52] Andrew W. Moore,et al. Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs , 1999, IJCAI.
[53] Manuela Veloso,et al. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .
[54] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[55] Bernhard Hengst,et al. Generating Hierarchical Structure in Reinforcement Learning from State Variables , 2000, PRICAI.
[56] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[57] Jesse Hoey,et al. APRICODD: Approximate Policy Construction Using Decision Diagrams , 2000, NIPS.
[58] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[59] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[60] Andrew W. Moore,et al. Direct Policy Search using Paired Statistical Tests , 2001, ICML.
[61] Stabilizing Value Function Approximation with the BFBP Algorithm , 2001, NIPS.
[62] Andrew G. Barto,et al. Autonomous discovery of temporal abstractions from interaction with an environment , 2002 .
[63] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.
[64] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.
[65] C. S. Wallace,et al. Coding Decision Trees , 1993, Machine Learning.
[66] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.
[67] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[68] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.