An object-oriented representation for efficient reinforcement learning
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] R. Bellman. Dynamic programming. , 1957, Science.
[3] A. Hordijk,et al. Linear Programming and Markov Decision Chains , 1979 .
[4] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[5] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[6] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[7] C. Watkins. Learning from delayed rewards , 1989 .
[8] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .
[9] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[10] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[11] Kenji Yamanishi,et al. A learning criterion for stochastic rules , 1990, COLT '90.
[12] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.
[13] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[14] Saso Dzeroski,et al. PAC-learnability of determinate logic programs , 1992, COLT '92.
[15] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[16] Magnus Borga,et al. Hierarchical Reinforcement Learning , 1993 .
[17] Craig Boutilier,et al. Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.
[18] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[20] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .
[21] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[22] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.
[23] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[24] William W. Cohen. Pac-learning Recursive Logic Programs: Negative Results , 1994, J. Artif. Intell. Res..
[25] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[26] William W. Cohen. Pac-Learning Recursive Logic Programs: Efficient Algorithms , 1994, J. Artif. Intell. Res..
[27] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[28] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[29] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[30] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.
[31] Christopher G. Atkeson,et al. A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.
[32] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[33] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[34] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[35] John Langford,et al. Probabilistic Planning in the Graphplan Framework , 1999, ECP.
[36] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[37] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[38] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[39] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[40] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..
[41] Balaraman Ravindran,et al. Symmetries and Model Minimization in Markov Decision Processes , 2001 .
[42] John N. Tsitsiklis,et al. Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.
[43] B. Scholl. Objects and attention: the state of the art , 2001, Cognition.
[44] Tim Oates,et al. The Thing that we Tried Didn't Work very Well: Deictic Representation in Reinforcement Learning , 2002, UAI.
[45] L. Kaelbling,et al. Learning with Deictic Representation , 2002 .
[46] Dale Schuurmans,et al. Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs , 2002, ICML.
[47] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[48] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.
[49] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[50] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[51] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[52] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[53] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[54] Maria Fox,et al. PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..
[55] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[56] Yoav Shoham,et al. Multi-Agent Reinforcement Learning:a critical survey , 2003 .
[57] Michael O. Duff,et al. Design for an Optimal Probe , 2003, ICML.
[58] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[59] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[60] Roni Khardon,et al. Learning to Take Actions , 1996, Machine Learning.
[61] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[62] Luc De Raedt,et al. Relational Reinforcement Learning , 2001, Machine Learning.
[63] Peter Stone,et al. Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.
[64] A. Barto,et al. An algebraic approach to abstraction in reinforcement learning , 2004 .
[65] Thore Graepel,et al. LEARNING TO FIGHT , 2004 .
[66] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[67] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[68] Håkan L. S. Younes,et al. The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..
[69] Martijn van Otterlo,et al. A survey of reinforcement learning in relational domains , 2005 .
[70] Gerald Tesauro,et al. Online Resource Allocation Using Decompositional Reinforcement Learning , 2005, AAAI.
[71] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[72] Multi-Agent Environment,et al. Hierarchical Reinforcement Learning in , 2005 .
[73] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[74] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[75] Peter Stone,et al. State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.
[76] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.
[77] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[78] Michael L. Littman,et al. A hierarchical approach to efficient reinforcement learning in deterministic domains , 2006, AAMAS '06.
[79] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[80] Chih-Han Yu,et al. Quadruped robot obstacle negotiation via reinforcement learning , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..
[81] Kathryn E. Merrick,et al. Motivated reinforcement learning for non-player characters in persistent computer game worlds , 2006, ACE '06.
[82] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .
[83] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[84] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[85] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[86] Michael L. Littman,et al. Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.
[87] Balaraman Ravindran,et al. Deictic Option Schemas , 2007, IJCAI.
[88] Alexander L. Strehl,et al. Model-Based Reinforcement Learning in Factored-State MDPs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[89] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.
[90] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[91] L. P. Kaelbling,et al. Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..
[92] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.
[93] Joelle Pineau,et al. Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning , 2008, AAAI.
[94] S. Shalev-Shwartz. Low ` 1-Norm and Guarantees on Sparsifiability , 2008 .
[95] Alejandro Pazos Sierra,et al. Encyclopedia of Artificial Intelligence , 2008 .
[96] A. Woodward,et al. Learning and the Infant Mind , 2008 .
[97] R. Baillargeon,et al. An Account of Infants' Physical Reasoning , 2008 .
[98] Doina Precup,et al. Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.
[99] Martijn van Otterlo,et al. The logic of adaptive behavior : knowledge representation and algorithms for the Markov decision process framework in first-order domains , 2008 .
[100] Luc De Raedt,et al. Logical and Relational Learning: From ILP to MRDM (Cognitive Technologies) , 2008 .
[101] Onur Mutlu,et al. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.
[102] Lihong Li,et al. The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.
[103] Thomas J. Walsh,et al. Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.
[104] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .
[105] Michael L. Littman,et al. Hierarchical Reinforcement Learning , 2009, Encyclopedia of Artificial Intelligence.
[106] R. Baillargeon,et al. Young infants’ reasoning about physical events involving inert and self-propelled objects , 2009, Cognitive Psychology.
[107] Thomas J. Walsh,et al. Generalizing Apprenticeship Learning across Hypothesis Classes , 2010, ICML.