APPL: Adaptive Planner Parameter Learning

While current autonomous navigation systems allow robots to successfully drive themselves from one point to another in specific environments, they typically require extensive manual parameter re-tuning by human robotics experts in order to function in new environments. Furthermore, even for just one complex environment, a single set of fine-tuned parameters may not work well in different regions of that environment. These problems prohibit reliable mobile robot deployment by nonexpert users. As a remedy, we propose Adaptive Planner Parameter Learning (APPL), a machine learning framework that can leverage non-expert human interaction via several modalities— including teleoperated demonstrations, corrective interventions, and evaluative feedback—and also unsupervised reinforcement learning to learn a parameter policy that can dynamically adjust the parameters of classical navigation systems in response to changes in the environment. APPL inherits safety and explainability from classical navigation systems while also enjoying the benefits of machine learning, i.e., the ability to adapt and improve from experience. We present a suite of individual APPL methods and also a unifying cycle-of-learning scheme that combines all the proposed methods in a framework that can improve navigation performance through continual, iterative human interaction and simulation training.

[1]  Dinesh Manocha,et al.  Crowd-Steer: Realtime Smooth and Collision-Free Robot Navigation in Densely Crowded Scenarios Trained using High-Fidelity Simulation , 2020, IJCAI.

[2]  Murat Sensoy,et al.  Evidential Deep Learning to Quantify Classification Uncertainty , 2018, NeurIPS.

[3]  Sebastian Thrun,et al.  An approach to learning mobile robot navigation , 1995, Robotics Auton. Syst..

[4]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[5]  Maggie Wigness,et al.  Hao Zhang: Robot Adaptation to Unstructured Terrains by Joint Representation and Apprenticeship Learning , 2019, Robotics: Science and Systems.

[6]  Christopher G. Atkeson,et al.  Online Bayesian changepoint detection for articulated motion models , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[8]  John Valasek,et al.  Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time , 2018, AAAI.

[9]  Bo Liu,et al.  APPLR: Adaptive Planner Parameter Learning from Reinforcement , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Cynthia Breazeal,et al.  Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[11]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[12]  Peter Stone,et al.  APPLD: Adaptive Planner Parameter Learning From Demonstration , 2020, IEEE Robotics and Automation Letters.

[13]  Vinicius G. Goecks,et al.  Cycle-of-Learning for Autonomous Systems from Human Interaction , 2018, ArXiv.

[14]  Ufuk Topcu,et al.  From Agile Ground to Aerial Navigation: Learning from Learned Hallucination , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  P. Stone,et al.  Lifelong Navigation , 2020, ArXiv.

[16]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[17]  Bo Liu,et al.  Agile Robot Navigation through Hallucinated Learning and Sober Deployment , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Kaiyu Zheng,et al.  ROS Navigation Tuning Guide , 2017, Studies in Computational Intelligence.

[19]  Peter Stone,et al.  APPLE: Adaptive Planner Parameter Learning From Evaluative Feedback , 2021, IEEE Robotics and Automation Letters.

[20]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[21]  Robin R. Murphy,et al.  UAV assisted USV visual navigation for marine mass casualty incident response , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Roland Siegwart,et al.  From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Oussama Khatib,et al.  Elastic bands: connecting path planning and control , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[24]  Bo Liu,et al.  A Lifelong Learning Approach to Mobile Robot Navigation , 2021, IEEE Robotics and Automation Letters.

[25]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[26]  Peter Stone,et al.  Motion Control for Mobile Robot Navigation Using Machine Learning: a Survey , 2020, ArXiv.

[27]  Bo Liu,et al.  APPLI: Adaptive Planner Parameter Learning From Interventions , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[29]  Diane J. Cook,et al.  A survey of methods for time series change point detection , 2017, Knowledge and Information Systems.

[30]  Peter Stone,et al.  Learning Inverse Kinodynamics for Accurate High-Speed Off-Road Navigation on Unstructured Terrain , 2021, IEEE Robotics and Automation Letters.

[31]  Bo Liu,et al.  Toward Agile Maneuvers in Highly Constrained Spaces: Learning From Hallucination , 2020, IEEE Robotics and Automation Letters.

[32]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[33]  Vinicius G. Goecks,et al.  Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Dense and Sparse Reward Environments , 2020, AAMAS.