LoOP: Iterative learning for optimistic planning on robots

Abstract Efficient robotic behaviors require robustness and adaptation to dynamic changes of the environment, whose characteristics rapidly vary during robot operation. To generate effective robot action policies, planning and learning techniques have shown the most promising results. However, if considered individually, they present different limitations. Planning techniques lack generalization among similar states and require experts to define behavioral routines at different levels of abstraction. Conversely, learning methods usually require a considerable number of training samples and iterations of the algorithm. To overcome these issues, and to efficiently generate robot behaviors, we introduce LoOP , an iterative learning algorithm for optimistic planning that combines state-of-the-art planning and learning techniques to generate action policies. The main contribution of LoOP is the combination of Monte-Carlo Search Planning and Q-learning, which enables focused exploration during policy refinement in different robotic applications. We demonstrate the robustness and flexibility of LoOP in various domains and multiple robotic platforms, by validating the proposed approach with an extensive experimental evaluation.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[3]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[4]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[5]  Marc D. Killpack,et al.  A Versatile Multi-Robot Monte Carlo Tree Search Planner for On-Line Coverage Path Planning , 2020, ArXiv.

[6]  George Konidaris,et al.  Constructing Abstraction Hierarchies Using a Skill-Symbol Loop , 2015, IJCAI.

[7]  Daniele Nardi,et al.  Q-CP: Learning Action Values for Cooperative Planning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[10]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[11]  George Konidaris,et al.  An Analysis of Monte Carlo Tree Search , 2017, AAAI.

[12]  Zhiyong Liu,et al.  Learning Individual Features to Decompose State Space for Robotic Skill Learning , 2020, 2020 Chinese Control And Decision Conference (CCDC).

[13]  Richard S. Sutton,et al.  Temporal-difference search in computer Go , 2012, Machine Learning.

[14]  Aleksander Czechowski,et al.  Decentralized MCTS via Learned Teammate Models , 2020, IJCAI.

[15]  Sylvain Gelly,et al.  Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[16]  Martial Hebert,et al.  Improved Learning of Dynamics Models for Control , 2016, ISER.

[17]  Alejandro Agostini,et al.  Reinforcement Learning with a Gaussian mixture model , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[18]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[20]  Peter Stone,et al.  iCORPP: Interleaved Commonsense Reasoning and Probabilistic Planning on Robots , 2020, ArXiv.

[21]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[22]  Yoshihiko Nakamura,et al.  Sequential Monte Carlo controller that integrates physical consistency and motion knowledge , 2018, Auton. Robots.

[23]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[24]  Girish Chowdhary,et al.  Off-policy reinforcement learning with Gaussian processes , 2014, IEEE/CAA Journal of Automatica Sinica.

[25]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[26]  Stefanos Nikolaidis,et al.  Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[27]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[28]  Larry P. Heck,et al.  Robot Learning by Collaborative Network Training: A Self-Supervised Method using Ranking , 2019, AAMAS.

[29]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[30]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[31]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[32]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[33]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[34]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[35]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[36]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[37]  Prasad Tadepalli,et al.  Solving multiagent assignment Markov decision processes , 2009, AAMAS.

[38]  Jessica B. Hamrick,et al.  Combining Q-Learning and Search with Amortized Value Estimates , 2020, ICLR.

[39]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[40]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[41]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[42]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[43]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[44]  Shlomo Zilberstein,et al.  Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..

[45]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[46]  Yuchen Cui,et al.  Uncertainty-Aware Data Aggregation for Deep Imitation Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[47]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[48]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[49]  M.R. Meybodi,et al.  Solving Multi-Agent Markov Decision Processes using learning automata , 2008, 2008 6th International Symposium on Intelligent Systems and Informatics.

[50]  Lynne E. Parker,et al.  A Reinforcement Learning Algorithm in Cooperative Multi-Robot Domains , 2005, J. Intell. Robotic Syst..

[51]  Jorge Cortés,et al.  Exploiting Bias for Cooperative Planning in Multi-Agent Tree Search , 2020, IEEE Robotics and Automation Letters.

[52]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[53]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[54]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[55]  Ciro Potena,et al.  Automatic model based dataset generation for fast and accurate crop and weeds detection , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[56]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[57]  Tom Schaul,et al.  Better Generalization with Forecasts , 2013, IJCAI.

[58]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[59]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[60]  Bo He,et al.  Improving Interactive Reinforcement Agent Planning with Human Demonstration , 2019, ArXiv.

[61]  Peter Stone,et al.  A synthesis of automated planning and reinforcement learning for efficient, robust decision-making , 2016, Artif. Intell..

[62]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[63]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .