Learning A Risk-Aware Trajectory Planner From Demonstrations Using Logic Monitor

Risk awareness is an important factor to consider when deploying policies on robots in the real-world. Defining the right set of risk metrics can be difficult. In this work, we use a logic monitor that keeps track of the environmental agents’ behaviors and provides a risk metric that the controlled agent can incorporate during planning. We introduce LogicRiskNet, a learning structure that can be constructed from temporal logic formulas describing rules governing a safe agent’s behaviors. The network’s parameters can be learned from demonstration data. By using temporal logic, the network provides an interpretable architecture that can explain what risk metrics are important to the human. We integrate LogicRiskNet in an inverse optimal control (IOC) framework and show that we can learn to generate trajectory plans that accurately mimic the expert’s risk handling behaviors solely from demonstration data. We evaluate our method on a real-world driving dataset.

[1]  Csaba Szepesvári,et al.  Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[2]  Christel Baier,et al.  Principles of model checking , 2008 .

[3]  Oded Maler,et al.  Robust Satisfaction of Temporal Logic over Real-Valued Signals , 2010, FORMATS.

[4]  Dejan Nickovic,et al.  Parametric Identification of Temporal Properties , 2011, RV.

[5]  Wolfram Burgard,et al.  Learning driving styles for autonomous vehicles from demonstration , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[7]  Garvit Juniwal,et al.  Robust online monitoring of signal temporal logic , 2015, Formal Methods in System Design.

[8]  Calin Belta,et al.  Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Calin Belta,et al.  Formal Methods for Discrete-Time Dynamical Systems , 2017 .

[10]  Matthias Scheutz,et al.  Interpretable apprenticeship learning with temporal logic specifications , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[11]  Qitong Gao Deep Reinforcement Learning with Temporal Logic Specifications , 2018 .

[12]  Ebru Aydin Gol Efficient Online Monitoring and Formula Synthesis with Past STL , 2018, 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT).

[13]  Daniela Rus,et al.  Navigating Congested Environments with Risk Level Sets , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Sheila A. McIlraith,et al.  Learning Reward Machines for Partially Observable Reinforcement Learning , 2019, NeurIPS.

[15]  Kiran Vodrahalli,et al.  Learning to Plan with Logical Automata , 2019, Robotics: Science and Systems.

[16]  Darwin G. Caldwell,et al.  Uncertainty-Aware Imitation Learning using Kernelized Movement Primitives , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Eder Santana,et al.  Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Tyler H. Summers,et al.  Control Design for Risk-Based Signal Temporal Logic Specifications , 2020, IEEE Control Systems Letters.

[19]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Cristian-Ioan Vasile,et al.  Deep Bayesian Nonparametric Learning of Rules and Plans from Demonstrations with a Learned Automaton Prior , 2020, AAAI.

[21]  George J. Pappas,et al.  STL Robustness Risk over Discrete-Time Stochastic Processes , 2021, 2021 60th IEEE Conference on Decision and Control (CDC).

[22]  Marco Pavone,et al.  Backpropagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods , 2020, WAFR.

[23]  A. Krause,et al.  Risk-Averse Offline Reinforcement Learning , 2021, ICLR.

[24]  Huijing Zhao,et al.  A Survey of Deep RL and IL for Autonomous Driving Policy Learning , 2021, IEEE Transactions on Intelligent Transportation Systems.