论文信息 - Fitted Q-Learning for Relational Domains - 字舞流文

Fitted Q-Learning for Relational Domains

We consider the problem of Approximate Dynamic Programming in relational domains. Inspired by the success of fitted Q-learning methods in propositional settings, we develop the first relational fitted Q-learning algorithms by representing the value function and Bellman residuals. When we fit the Q-functions, we show how the two steps of Bellman operator; application and projection steps can be performed using a gradient-boosting technique. Our proposed framework performs reasonably well on standard domains without using domain models and using fewer training trajectories.

Kristian Kersting | Kaushik Roy | Sriraam Natarajan | Ronald Parr | Srijita Das | Ronald E. Parr | K. Kersting | S. Natarajan | Srijita Das | Kaushik Roy

[1] Saso Dzeroski,et al. Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.

[2] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.

[3] Andrew McCallum,et al. Introduction to Statistical Relational Learning , 2007 .

[4] Luc De Raedt,et al. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation , 2016, Statistical Relational Artificial Intelligence.

[5] Hendrik Blockeel,et al. Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[6] Roni Khardon,et al. First Order Decision Diagrams for Relational MDPs , 2007, IJCAI.

[7] Kristian Kersting,et al. Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach , 2011, IJCAI.

[8] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[9] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.

[10] R.M. Dunn,et al. Brains, behavior, and robotics , 1983, Proceedings of the IEEE.

[11] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[12] Kurt Driessens,et al. Speeding Up Relational Reinforcement Learning through the Use of an Incremental First Order Decision Tree Learner , 2001, ECML.

[13] Marcello Restelli,et al. Boosted Fitted Q-Iteration , 2017, ICML.

[14] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.

[15] John K. Slaney,et al. Blocks World revisited , 2001, Artif. Intell..

[16] Razvan Pascanu,et al. Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[17] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[18] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.

[19] Kristian Kersting,et al. Gradient-based boosting for statistical relational learning: The relational dependency network case , 2011, Machine Learning.

[20] S. Whiteson,et al. Adaptive Tile Coding for Value Function Approximation , 2007 .

[21] Robert Givan,et al. Discovering Relational Domain Features for Probabilistic Planning , 2007, ICAPS.

[22] Scott Sanner,et al. Practical solution techniques for first-order MDPs , 2009, Artif. Intell..

[23] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[24] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[25] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[26] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[27] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[28] Razvan Pascanu,et al. Relational recurrent neural networks , 2018, NeurIPS.

[29] Kurt Driessens,et al. Relational Instance Based Regression for Relational Reinforcement Learning , 2003, ICML.

[30] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[31] Alistair Black,et al. Introduction , 2004, Libr. Trends.

[32] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[33] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[34] Thomas Gärtner,et al. Graph kernels and Gaussian processes for relational reinforcement learning , 2006, Machine Learning.

[35] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[36] Kristian Kersting,et al. Learning Markov Logic Networks via Functional Gradient Boosting , 2011, 2011 IEEE 11th International Conference on Data Mining.

[37] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[38] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[39] De,et al. Relational Reinforcement Learning , 2022 .

[40] Robert Givan,et al. Relational Reinforcement Learning: An Overview , 2004, ICML 2004.

[41] Kristian Kersting,et al. Structure learning for relational logistic regression: an ensemble approach , 2018, Data Mining and Knowledge Discovery.

[42] Jennifer Neville,et al. Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning , 2002, ICML.

[43] Shan Luo,et al. Neural Logic Reinforcement Learning , 2019, ICML.

[44] Kristian Kersting,et al. Non-parametric policy gradients: a unified treatment of propositional and relational domains , 2008, ICML '08.

[45] Craig Boutilier,et al. Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[46] Craig Boutilier,et al. Imitation and Reinforcement Learning in Agents with Heterogeneous Actions , 2001, Canadian Conference on AI.