Planning and Execution using Inaccurate Models with Provable Guarantees

Models used in modern planning problems to simulate outcomes of real world action executions are becoming increasingly complex, ranging from simulators that do physics-based reasoning to precomputed analytical motion primitives. However, robots operating in the real world often face situations not modeled by these models before execution. This imperfect modeling can lead to highly suboptimal or even incomplete behavior during execution. In this paper, we propose CMAX an approach for interleaving planning and execution. CMAX adapts its planning strategy online during real-world execution to account for any discrepancies in dynamics during planning, without requiring updates to the dynamics of the model. This is achieved by biasing the planner away from transitions whose dynamics are discovered to be inaccurately modeled, thereby leading to robot behavior that tries to complete the task despite having an inaccurate model. We provide provable guarantees on the completeness and efficiency of the proposed planning and execution framework under specific assumptions on the model, for both small and large state spaces. Our approach CMAX is shown to be efficient empirically in simulated robotic tasks including 4D planar pushing, and in real robotic experiments using PR2 involving a 3D pick-and-place task where the mass of the object is incorrectly modeled, and a 7D arm planning task where one of the joints is not operational leading to discrepancy in dynamics. The video of our physical robot experiments can be found at this https URL

[1]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[2]  Divyam Rastogi,et al.  Sample-efficient Reinforcement Learning via Difference Models , 2018 .

[3]  Pieter Abbeel,et al.  Learning for control from multiple demonstrations , 2008, ICML '08.

[4]  Timothy Bretl,et al.  Using Motion Primitives in Probabilistic Sample-Based Planning for Humanoid Robots , 2008, WAFR.

[5]  Dimitri P. Bertsekas,et al.  Dynamic programming and optimal control, 3rd Edition , 2005 .

[6]  Sehoon Ha,et al.  Reducing hardware experiments for model learning and policy optimization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Sven Koenig,et al.  Real-time adaptive A* , 2006, AAMAS '06.

[8]  Andrey Bernstein,et al.  Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains , 2010, Machine Learning.

[9]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[10]  Maxim Likhachev,et al.  Search-based planning for manipulation with motion primitives , 2010, 2010 IEEE International Conference on Robotics and Automation.

[11]  Reid G. Simmons,et al.  Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.

[12]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[13]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[14]  Steven M. LaValle,et al.  Randomized Kinodynamic Planning , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[15]  Pietro Falco,et al.  Data-efficient control policy search using residual dynamics learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[17]  Stefan Schaal,et al.  Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[18]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[20]  Dmitry Berenson,et al.  Learning When to Trust a Dynamics Model for Planning in Reduced State Spaces , 2020, IEEE Robotics and Automation Letters.

[21]  Peter Stone,et al.  Model-based function approximation in reinforcement learning , 2007, AAMAS '07.

[22]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[23]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[24]  Jean Oh,et al.  Modeling cooperative navigation in dense human crowds , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[26]  John Langford,et al.  Exploration in Metric State Spaces , 2003, ICML.

[27]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Nan Jiang,et al.  PAC Reinforcement Learning With an Imperfect Model , 2018, AAAI.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Stefan Schaal,et al.  Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space , 2000, ICML.

[33]  Maxim Likhachev,et al.  Planning for Manipulation with Adaptive Motion Primitives , 2011, 2011 IEEE International Conference on Robotics and Automation.

[34]  Michael L. Littman,et al.  Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.