Entropy-based strategies for physical exploration of the environment's degrees of freedom

Physical exploration refers to the challenge of autonomously discovering and learning how to manipulate the environment's degrees of freedom (DOF)-by identifying promising points of interaction and pushing or pulling object parts to reveal DOF and their properties. Recent existing work focused on sub-problems like estimating DOF parameters from given data. Here, we address the integrated problem, focusing on the higher-level strategy to iteratively decide on the next exploration point before applying motion generation methods to execute the explorative action and data analysis methods to interpret the feedback. We propose to decide on exploration points based on the expected information gain, or change in entropy in the robot's current belief (uncertain knowledge) about the DOF. To this end, we first define how we represent such a belief. This requires dealing with the fact that the robot initially does not know which random variables (which DOF, and depending on their type, which DOF properties) actually exist. We then propose methods to estimate the expected information gain for an exploratory action. We analyze these strategies in simple environments and evaluate them in combination with full motion planning and data analysis in a physical simulation environment.

[1]  Leslie Pack Kaelbling,et al.  Interactive Bayesian identification of kinematic mechanisms , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Wolfram Burgard,et al.  Learning the dynamics of doors for robotic manipulation , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[4]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[5]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[6]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[7]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[8]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[9]  Oliver Brock,et al.  Learning to Manipulate Articulated Objects in Unstructured Environments Using a Grounded Relational Representation , 2008, Robotics: Science and Systems.

[10]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[11]  J. Andrew Bagnell,et al.  Perceiving, learning, and exploiting object affordances for autonomous pile manipulation , 2013, Auton. Robots.

[12]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[13]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[14]  Leslie Pack Kaelbling,et al.  Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..

[15]  Wolfram Burgard,et al.  A Probabilistic Framework for Learning Kinematic Models of Articulated Objects , 2011, J. Artif. Intell. Res..

[16]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[17]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[18]  David Huard,et al.  PyMC: Bayesian Stochastic Modelling in Python. , 2010, Journal of statistical software.

[19]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[20]  Andrew Y. Ng,et al.  Near-Bayesian exploration in polynomial time , 2009, ICML '09.

[21]  Oliver Brock,et al.  Interactive Perception of Articulated Objects , 2010, ISER.

[22]  A. S. Morris,et al.  Nonlinear robot system identification based on neural network models , 1992 .

[23]  J. Andrew Bagnell,et al.  Clearing a pile of unknown objects using interactive perception , 2013, 2013 IEEE International Conference on Robotics and Automation.

[24]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[25]  Oliver Kroemer,et al.  Maximally informative interaction learning for scene exploration , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.