Interpretable apprenticeship learning with temporal logic specifications

Recent work has addressed using formulas in linear temporal logic (LTL) as specifications for agents planning in Markov Decision Processes (MDPs). We consider the inverse problem: inferring an LTL specification from demonstrated behavior trajectories in MDPs. We formulate this as a multiobjective optimization problem, and describe state-based (“what actually happened”) and action-based (“what the agent expected to happen”) objective functions based on a notion of “violation cost”. We demonstrate the efficacy of the approach by employing genetic programming to solve this problem in two simple domains.

[1]  Calin Belta,et al.  Distributed information gathering policies under temporal logic constraints , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[2]  Ivan Beschastnikh,et al.  General LTL Specification Mining (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Zhendong Su,et al.  Online inference and enforcement of temporal properties , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[4]  Zhendong Su,et al.  Symbolic mining of temporal specifications , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[5]  Emilio Frazzoli,et al.  Least-violating control strategy synthesis with safety rules , 2013, HSCC '13.

[6]  Matthias Scheutz,et al.  Value Alignment or Misalignment - What Will Keep Systems Accountable? , 2017, AAAI Workshops.

[7]  Joel W. Burdick,et al.  Finite state control of POMDPs with LTL specifications , 2014, 2014 American Control Conference.

[8]  Ufuk Topcu,et al.  Robust control of uncertain Markov Decision Processes with temporal logic specifications , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[9]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[10]  Emilio Frazzoli,et al.  Incremental sampling-based algorithm for minimum-violation motion planning , 2013, 52nd IEEE Conference on Decision and Control.

[11]  Calin Belta,et al.  LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees , 2011, ArXiv.

[12]  Calin Belta,et al.  Temporal logic inference for classification and prediction from data , 2014, HSCC.

[13]  Daniil Chivilikhin,et al.  Inferring Temporal Properties of Finite-State Machine Models with Genetic Programming , 2015, GECCO.

[14]  Amir Pnueli,et al.  The temporal logic of programs , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[15]  Matthias Scheutz,et al.  What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[16]  Jan Kretínský,et al.  From LTL to Deterministic Automata: A Safraless Compositional Approach , 2014, CAV.

[17]  Dimos V. Dimarogonas,et al.  Multi-agent plan reconfiguration under local LTL specifications , 2015, Int. J. Robotics Res..

[18]  General LTL Specification Mining , 2015 .

[19]  Ufuk Topcu,et al.  Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.

[20]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[21]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[22]  Lydia E. Kavraki,et al.  This Time the Robot Settles for a Cost: A Quantitative Approach to Temporal Logic Planning with Partial Satisfaction , 2015, AAAI.

[23]  Krishnendu Chatterjee,et al.  Temporal logic motion planning using POMDPs with parity objectives: case study paper , 2015, HSCC.

[24]  Christel Baier,et al.  Principles of model checking , 2008 .