论文信息 - A survey of reinforcement learning in relational domains

A survey of reinforcement learning in relational domains

Reinforcement learning has developed into a primary approach for learning control strategies for autonomous agents. However, most of the work has focused on the algorithmic aspect, i.e. various ways of computing value functions and policies. Usually the representational aspects were limited to the use of attribute-value or propositional languages to describe states, actions etc. A recent direction - under the general name of relational reinforcement learning - is concerned with upgrading the representation of reinforcement learning methods to the first-order case, being able to speak, reason and learn about objects and relations between objects. This survey aims at presenting an introduction to this new field, starting from the classical reinforcement learning framework. We will describe the main motivations and challenges, and give a comprehensive survey of methods that have been proposed in the literature. The aim is to give a complete survey of the available literature, of the underlying motivations and of the implications if the new methods for learning in large, relational and probabilistic environments.

Martijn van Otterlo | M. V. Otterlo

[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[2] J. McCarthy. Situations, Actions, and Causal Laws , 1963 .

[3] B. Ripley,et al. Pattern Recognition , 1968, Nature.

[4] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[5] J. Lloyd. Foundations of Logic Programming , 1984, Symbolic Computation.

[6] Ivan Bratko,et al. Prolog Programming for Artificial Intelligence , 1986 .

[7] Oren Etzioni,et al. Explanation-Based Learning: A Problem Solving Perspective , 1989, Artif. Intell..

[8] S. Brison. The Intentional Stance , 1989 .

[9] Joseph Y. Halpern. An Analysis of First-Order Logics of Probability , 1989, IJCAI.

[10] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[11] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[12] Richard S. Sutton,et al. Online Learning with Random Representations , 1993, ICML.

[13] Peter A. Flach. Simply logical - intelligent reasoning by example , 1994, Wiley professional computing.

[14] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[16] Pat Langley,et al. Elements of Machine Learning , 1995 .

[17] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[18] Luc De Raedt,et al. Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[19] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[20] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[21] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[22] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[23] Raymond J. Mooney,et al. Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs , 1995, J. Artif. Intell. Res..

[24] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[25] David L. Poole,et al. A Framework for Decision-Theoretic Planning I: Combining the Situation Calculus, Conditional Plans, Probability and Utility , 1996, UAI.

[26] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .

[27] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[28] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[29] Scott Sherwood Benson,et al. Learning action models for reactive autonomous agents , 1996 .

[30] Hector J. Levesque,et al. GOLOG: A Logic Programming Language for Dynamic Domains , 1997, J. Log. Program..

[31] Zoubin Ghahramani,et al. Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[32] Marco Botta,et al. FONN: Combining First Order Logic with Connectionist Learning , 1997, ICML.

[33] A. Clark,et al. Trading spaces: Computation, representation, and the limits of uninformed learning , 1997, Behavioral and Brain Sciences.

[34] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..

[35] Luc De Raedt,et al. Logical Settings for Concept-Learning , 1997, Artif. Intell..

[36] Craig Boutilier,et al. Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[37] Amedeo Cesta,et al. Recent Advances in AI Planning , 1997, Lecture Notes in Computer Science.

[38] Avi Pfeffer,et al. Learning Probabilities for Noisy First-Order Rules , 1997, IJCAI.

[39] Supplementing Neural Reinforcement Learning with Symbolic Methods , 1998, Hybrid Neural Systems.

[40] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[41] IT Kee-EungKim. Solving Factored MDPs Using Non-homogeneous Partitions , 1998 .

[42] Luc De Raedt,et al. Top-Down Induction of Clustering Trees , 1998, ICML.

[43] Doina Precup,et al. Constructive Function Approximation , 1998 .

[44] Ron Sun,et al. Autonomous learning of sequential tasks: experiments and analyses , 1998, IEEE Trans. Neural Networks.

[45] Michael Thielscher,et al. Introduction to the Fluent Calculus , 1998, Electron. Trans. Artif. Intell..

[46] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[47] Jim Blythe,et al. An Overview of Planning Under Certainty , 1999, Artificial Intelligence Today.

[48] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[49] Roni Khardon,et al. Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[50] Peter A. Flach,et al. IBC: A First-Order Bayesian Classifier , 1999, ILP.

[51] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[52] Jesse Hoey,et al. SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[53] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[54] Craig Boutilier,et al. Knowledge Representation for Stochastic Decision Process , 1999, Artificial Intelligence Today.

[55] Andrew McCallum,et al. Using Reinforcement Learning to Spider the Web Efficiently , 1999, ICML.

[56] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .

[57] Stuart J. Russell,et al. Convergence of Reinforcement Learning with General Function Approximators , 1999, IJCAI.

[58] Hendrik Blockeel,et al. Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[59] Stuart I. Reynolds,et al. Adaptive Resolution Model-Free Reinforcement Learning: Decision Boundary Partitioning , 2000, International Conference on Machine Learning.

[60] Craig Boutilier,et al. Decision-Theoretic, High-Level Agent Programming in the Situation Calculus , 2000, AAAI/IAAI.

[61] Charles W. Anderson,et al. Approximating a Policy Can be Easier Than Approximating a Value Function , 2000 .

[62] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[63] Adaptive State-Space Quantisation and Multi-Task Reinforcement Learning Using . . . , 2000 .

[64] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[65] Hector Geffner,et al. Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[66] John F. Sowa,et al. Knowledge Representation and Reasoning , 2000 .

[67] C. Thornton. Truth from Trash: How Learning Makes Sense , 2000 .

[68] Pedro Isasi Viñuela,et al. Knowledge Representation Issues in Control Knowledge Learning , 2000, ICML.

[69] Luc De Raedt,et al. How to Upgrade Propositional Learners to First Order Logic: A Case Study , 2001, Machine Learning and Its Applications.

[70] Craig Boutilier. Planning and programming with first-order markov decision processes: insights and challenges , 2001 .

[71] Renaud Lecoeuche. Learning Optimal Dialogue Management Rules by Using Reinforcement Learning and Inductive Logic Programming , 2001, NAACL.

[72] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.

[73] Kurt Driessens,et al. Learning digger using hierarchical reinforcement learning for concurrent goals , 2001 .

[74] Kurt Driessens,et al. Speeding Up Relational Reinforcement Learning through the Use of an Incremental First Order Decision Tree Learner , 2001, ECML.

[75] Avi Pfeffer,et al. IBAL: A Probabilistic Rational Programming Language , 2001, IJCAI.

[76] Julie A. Adams,et al. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence , 2001, AI Mag..

[77] Nada Lavrač,et al. An Introduction to Inductive Logic Programming , 2001 .

[78] Dale Schuurmans,et al. Direct value-approximation for factored MDPs , 2001, NIPS.

[79] Antony Browne,et al. Connectionist inference models , 2001, Neural Networks.

[80] Craig Boutilier,et al. Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[81] John K. Slaney,et al. Blocks World revisited , 2001, Artif. Intell..

[82] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.

[83] Ioan Alfred Letia,et al. Developing collaborative Golog agents by reinforcement learning , 2001, Proceedings 13th IEEE International Conference on Tools with Artificial Intelligence. ICTAI 2001.

[84] John F. Sowa,et al. Knowledge representation: logical, philosophical, and computational foundations , 2000 .

[85] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[86] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[87] Tim Oates,et al. The Thing that we Tried Didn't Work very Well: Deictic Representation in Reinforcement Learning , 2002, UAI.

[88] Saso Dzeroski,et al. Integrating Experimentation and Guidance in Relational Reinforcement Learning , 2002, ICML.

[89] Paul E. Utgoff,et al. Many-Layered Learning , 2002, Neural Computation.

[90] Saso Dzeroski. Relational Reinforcement Learning for Agents in Worlds with Objects , 2002, Adaptive Agents and Multi-Agents Systems.

[91] Alex M. Andrew,et al. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2002 .

[92] Alexander Russell,et al. A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics , 2002, NIPS.

[93] Steve Rabin,et al. AI Game Programming Wisdom , 2002 .

[94] Eyal Amir,et al. Adventure Games: A Challenge for Cognitive Robotics , 2002 .

[95] van Martijn Otterlo. Relational Representations in Reinforcement Learning: Review and Open Problems , 2002, ICML 2002.

[96] Vadim Bulitko,et al. Performance of Lookahead Control Policies in the Face of Abstractions and Approximations , 2002, SARA.

[97] Yu Hen Hu,et al. Cognitive economy and the role of representation in on-line learning , 2002 .

[98] Leslie Pack Kaelbling,et al. Reinforcement Learning by Policy Search , 2002 .

[99] Saso Dzeroski. Learning in Rich Representations: Inductive Logic Programming and Computational Scientific Discovery , 2002, ILP.

[100] Axel Großmann,et al. Symbolic Dynamic Programming within the Fluent Calculus , 2002 .

[101] Sven Koenig,et al. The interaction of representations and planning objectives for decision-theoretic planning tasks , 2002, J. Exp. Theor. Artif. Intell..

[102] Geoffrey E. Hinton,et al. Reinforcement learning for factored Markov decision processes , 2002 .

[103] Malcolm R. K. Ryan. Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies , 2002, ICML.

[104] Robert Givan,et al. Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[105] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[106] Eduardo F. Morales,et al. Scaling Up Reinforcement Learning with a Relational Representation , 2003 .

[107] Balaraman Ravindran,et al. SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.

[108] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[109] John W. Lloyd,et al. Symbolic Learning for Adaptive Agents , 2003 .

[110] Marcus A. Maloof,et al. Incremental rule learning with partial instance memory for changing concepts , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[111] Mehdi Dastani,et al. A characterization of sapient agents , 2003, IEMC '03 Proceedings. Managing Technologically Driven Organizations: The Human Side of Innovation and Change (IEEE Cat. No.03CH37502).

[112] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[113] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[114] Natalia,et al. Applying probabilistic rules to relational worlds , 2003 .

[115] D. Koller,et al. Planning under uncertainty in complex structured environments , 2003 .

[116] Leslie Pack Kaelbling,et al. Envelope-based Planning in Relational MDPs , 2003, NIPS.

[117] Barbara Messing,et al. An Introduction to MultiAgent Systems , 2002, Künstliche Intell..

[118] John W. Lloyd. Logic for learning - learning comprehensible theories from structured data , 2003, Cognitive Technologies.

[119] Pedro M. Domingos,et al. Dynamic Probabilistic Relational Models , 2003, IJCAI.

[120] Kurt Driessens,et al. Relational Instance Based Regression for Relational Reinforcement Learning , 2003, ICML.

[121] M. van Otterlo. Efficient Reinforcement Learning using Relational Aggregation , 2003 .

[122] Luc De Raedt,et al. Logical Markov Decision Programs , 2003 .

[123] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[124] Luc De Raedt,et al. Probabilistic logic learning , 2003, SKDD.

[125] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.

[126] Eduardo F. Morales,et al. Learning to fly by combining reinforcement learning with behavioural cloning , 2004, ICML.

[127] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[128] Claude Sammut,et al. Hierarchical reinforcement learning: a hybrid approach , 2004 .

[129] Thomas G. Dietterich,et al. Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine Learning.

[130] Eric B. Baum,et al. Toward a Model of Intelligence as an Economy of Agents , 1999, Machine Learning.

[131] Jan Ramon,et al. On the numeric stability of Gaussian processes regression for relational reinforcement learning , 2004, ICML 2004.

[132] Erfu Yang,et al. Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .

[133] Daniel S. Weld. Solving Relational MDPs with First-Order Machine Learning , 2004 .

[134] K. Kersting,et al. Logical Markov Decision Programs and the Convergence of Logical TD(lambda) , 2004, ILP.

[135] Thomas Lukasiewicz,et al. Relational Markov Games , 2004, JELIA.

[136] Luc De Raedt,et al. Probabilistic Inductive Logic Programming , 2004, ALT.

[137] Leslie Pack Kaelbling,et al. Learning Probabilistic Relational Planning Rules , 2004, ICAPS.

[138] Luc De Raedt,et al. Bellman goes relational , 2004, ICML.

[139] Steffen Hölldobler,et al. A Logic-based Approach to Dynamic Programming , 2004 .

[140] Ivan Bratko,et al. First Order Regression , 1997, Machine Learning.

[141] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[142] Kristian Kersting,et al. Challenges for Relational Reinforcement Learning , 2004 .

[143] Roni Khardon,et al. Learning to Take Actions , 1996, Machine Learning.

[144] Sylvie Thiébaux,et al. Exploiting First-Order Regression in Inductive Policy Selection , 2004, UAI.

[145] M. van Otterlo. Reinforcement Learning for Relational MDPs , 2004 .

[146] Kristian Kersting,et al. Bellman goes Relational (extended abstract) , 2004 .

[147] Marco Wiering,et al. Convergence and Divergence in Standard and Averaging Reinforcement Learning , 2004, ECML.

[148] S. Roncagliolo,et al. Function Approximation in Hierarchical Relational Reinforcement Learning , 2004 .

[149] Maurice Bruynooghe,et al. Towards Informed Reinforcement Learning , 2004, ICML 2004.

[150] Robert Givan,et al. Learning Domain-Specific Control Knowledge from Random Walks , 2004, ICAPS.

[151] Trevor Walker. Relational Reinforcement Learning via Sampling the Space of First-Order Conjunctive Features , 2004 .

[152] Amal El Fallah Seghrouchni,et al. Learning in BDI Multi-agent Systems , 2004, CLIMA.

[153] Pat Langley,et al. An architecture for persistent reactive behavior , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[154] D. Kibler,et al. Instance-based learning algorithms , 2004, Machine Learning.

[155] Thomas Lukasiewicz,et al. Game-Theoretic Agent Programming in Golog , 2004, ECAI.

[156] F. Divina. Hybrid Genetic Relational Search for Inductive Learning , 2004 .

[157] Robert Givan,et al. Relational Reinforcement Learning: An Overview , 2004, ICML 2004.

[158] Eduardo F. Morales,et al. Relational State Abstractions for Reinforcement Learning , 2004 .

[159] E. Karabaev,et al. FCPlanner : A Planning Strategy for First-Order MDPs , 2004 .

[160] Zhengzhu Feng,et al. Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[161] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[162] Saso Dzeroski,et al. Combining model-based and instance-based learning for first order regression , 2005, BNAIC.

[163] John E. Laird,et al. Soar-RL: integrating reinforcement learning with Soar , 2005, Cognitive Systems Research.

[164] Alberto RibesAbstract,et al. Multi agent systems , 2019, Proceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005)..

[165] S. Sanner. Simultaneous Learning of Structure and Value in Relational Reinforcement Learning , 2005 .

[166] Leslie Pack Kaelbling,et al. Learning Planning Rules in Noisy Stochastic Worlds , 2005, AAAI.

[167] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[168] Scott Sanner,et al. Approximate Linear Programming for First-order MDPs , 2005, UAI.

[169] Robert H. Sloan,et al. Reinforcement Learning and Function Approximation , 2005, FLAIRS.

[170] Jan Ramon. On the convergence of reinforcement learning using a decision tree learner , 2005, ICML 2005.

[171] Robert Givan,et al. Learning Measures of Progress for Planning Domains , 2005, AAAI.

[172] Bram Bakker,et al. Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2003 .

[173] Maurice Bruynooghe,et al. Multi-agent Relational Reinforcement Learning , 2005, LAMAS.

[174] Eldar Karabaev,et al. A Heuristic Search Algorithm for Solving First-Order MDPs , 2005, UAI.

[175] Andrew Wilson,et al. Toward a Topological Theory of Relational Reinforcement Learning for Navigation Tasks , 2005, FLAIRS.

[176] Thomas Gärtner,et al. Graph kernels and Gaussian processes for relational reinforcement learning , 2006, Machine Learning.

[177] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.

[178] Ben Taskar,et al. Markov Logic: A Unifying Framework for Statistical Relational Learning , 2007 .

[179] Daniel Kudenko,et al. Combining Reinforcement Learning with Symbolic Planning , 2007, Adaptive Agents and Multi-Agents Systems.

[180] De,et al. Relational Reinforcement Learning , 2001, Encyclopedia of Machine Learning and Data Mining.

[181] H. Pasula. Learning Probabilistic Planning Rules , .

[182] Yilan Gu. Macro-actions in the Stochastic Situation Calculus , 2022 .