Model-Based Relational RL When Object Existence is Partially Observable

We consider learning and planning in relational MDPs when object existence is uncertain and new objects may appear or disappear depending on previous actions or properties of other objects. Optimal policies actively need to discover objects to achieve a goal; planning in such domains in general amounts to a POMDP problem, where the belief is about the existence and properties of potential not-yet-discovered objects. We propose a computationally efficient extension of model-based relational RL methods that approximates these beliefs using discrete uncertainty predicates. In this formulation the belief update is learned using probabilistic rules and planning in the approximated belief space can be achieved using an extension of existing planners. We prove that the learned belief update rules encode an approximation of the exact belief updates of a POMDP formulation and demonstrate experimentally that the proposed approach successfully learns a set of relational rules appropriate to solve such problems.

[1]  Thomas G. Dietterich,et al.  Structured machine learning: the next ten years , 2008, Machine Learning.

[2]  Eyal Amir,et al.  Learning Partially Observable Deterministic Action Models , 2005, IJCAI.

[3]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[4]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[5]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[6]  Scott Sanner,et al.  Symbolic Dynamic Programming for First-order POMDPs , 2010, AAAI.

[7]  Marc Toussaint,et al.  Reasoning with Uncertainties Over Existence of Objects , 2013, AAAI Fall Symposia.

[8]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[9]  John Langford,et al.  Probabilistic Planning in the Graphplan Framework , 1999, ECP.

[10]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[11]  Nando de Freitas,et al.  Nonparametric Bayesian Logic , 2005, UAI.

[12]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[13]  Reinhard Wilhelm,et al.  Parametric shape analysis via 3-valued logic , 1999, POPL '99.

[14]  Neil Immerman,et al.  Abstract Planning with Unknown Object Quantities and Properties , 2009, SARA.

[15]  Marc Toussaint,et al.  Planning with Noisy Probabilistic Relational Rules , 2010, J. Artif. Intell. Res..

[16]  Roni Khardon,et al.  Relational Partially Observable MDPs , 2010, AAAI.

[17]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[18]  Thomas J. Walsh,et al.  Efficient learning of relational models for sequential decision making , 2010 .