Fitted Q-Learning for Relational Domains

We consider the problem of Approximate Dynamic Programming in relational domains. Inspired by the success of fitted Q-learning methods in propositional settings, we develop the first relational fitted Q-learning algorithms by representing the value function and Bellman residuals. When we fit the Q-functions, we show how the two steps of Bellman operator; application and projection steps can be performed using a gradient-boosting technique. Our proposed framework performs reasonably well on standard domains without using domain models and using fewer training trajectories.

[1]  Saso Dzeroski,et al.  Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.

[2]  Lihong Li,et al.  Analyzing feature generation for value-function approximation , 2007, ICML '07.

[3]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[4]  Luc De Raedt,et al.  Statistical Relational Artificial Intelligence: Logic, Probability, and Computation , 2016, Statistical Relational Artificial Intelligence.

[5]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[6]  Roni Khardon,et al.  First Order Decision Diagrams for Relational MDPs , 2007, IJCAI.

[7]  Kristian Kersting,et al.  Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach , 2011, IJCAI.

[8]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[9]  Nahum Shimkin,et al.  Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.

[10]  R.M. Dunn,et al.  Brains, behavior, and robotics , 1983, Proceedings of the IEEE.

[11]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[12]  Kurt Driessens,et al.  Speeding Up Relational Reinforcement Learning through the Use of an Incremental First Order Decision Tree Learner , 2001, ECML.

[13]  Marcello Restelli,et al.  Boosted Fitted Q-Iteration , 2017, ICML.

[14]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[15]  John K. Slaney,et al.  Blocks World revisited , 2001, Artif. Intell..

[16]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[17]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[18]  Dimitri P. Bertsekas,et al.  Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.

[19]  Kristian Kersting,et al.  Gradient-based boosting for statistical relational learning: The relational dependency network case , 2011, Machine Learning.

[20]  S. Whiteson,et al.  Adaptive Tile Coding for Value Function Approximation , 2007 .

[21]  Robert Givan,et al.  Discovering Relational Domain Features for Probabilistic Planning , 2007, ICAPS.

[22]  Scott Sanner,et al.  Practical solution techniques for first-order MDPs , 2009, Artif. Intell..

[23]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[24]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[25]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[26]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[27]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[28]  Razvan Pascanu,et al.  Relational recurrent neural networks , 2018, NeurIPS.

[29]  Kurt Driessens,et al.  Relational Instance Based Regression for Relational Reinforcement Learning , 2003, ICML.

[30]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[31]  Alistair Black,et al.  Introduction , 2004, Libr. Trends.

[32]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[33]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[34]  Thomas Gärtner,et al.  Graph kernels and Gaussian processes for relational reinforcement learning , 2006, Machine Learning.

[35]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[36]  Kristian Kersting,et al.  Learning Markov Logic Networks via Functional Gradient Boosting , 2011, 2011 IEEE 11th International Conference on Data Mining.

[37]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[38]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[39]  De,et al.  Relational Reinforcement Learning , 2022 .

[40]  Robert Givan,et al.  Relational Reinforcement Learning: An Overview , 2004, ICML 2004.

[41]  Kristian Kersting,et al.  Structure learning for relational logistic regression: an ensemble approach , 2018, Data Mining and Knowledge Discovery.

[42]  Jennifer Neville,et al.  Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning , 2002, ICML.

[43]  Shan Luo,et al.  Neural Logic Reinforcement Learning , 2019, ICML.

[44]  Kristian Kersting,et al.  Non-parametric policy gradients: a unified treatment of propositional and relational domains , 2008, ICML '08.

[45]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[46]  Craig Boutilier,et al.  Imitation and Reinforcement Learning in Agents with Heterogeneous Actions , 2001, Canadian Conference on AI.