Transfer of Temporal Logic Formulas in Reinforcement Learning

Transferring high-level knowledge from a source task to a target task is an effective way to expedite reinforcement learning (RL). For example, propositional logic and first-order logic have been used as representations of such knowledge. We study the transfer of knowledge between tasks in which the timing of the events matters. We call such tasks temporal tasks. We concretize similarity between temporal tasks through a notion of logical transferability, and develop a transfer learning approach between different yet similar temporal tasks. We first propose an inference technique to extract metric interval temporal logic (MITL) formulas in sequential disjunctive normal form from labeled trajectories collected in RL of the two tasks. If logical transferability is identified through this inference, we construct a timed automaton for each sequential conjunctive subformula of the inferred MITL formulas from both tasks. We perform RL on the extended state which includes the locations and clock valuations of the timed automata for the source task. We then establish mappings between the corresponding components (clocks, locations, etc.) of the timed automata from the two tasks, and transfer the extended Q-functions based on the established mappings. Finally, we perform RL on the extended state for the target task, starting with the transferred extended Q-functions. Our implementation results show, depending on how similar the source task and the target task are, that the sampling efficiency for the target task can be improved by up to one order of magnitude by performing RL in the extended state space, and further improved by up to another order of magnitude using the transferred extended Q-functions.

[1]  Marvin A. Carlson Editor , 2015 .

[2]  Georgios E. Fainekos,et al.  Mining parametric temporal logic properties in model-based design for cyber-physical systems , 2015, International Journal on Software Tools for Technology Transfer.

[3]  Calin Belta,et al.  Q-Learning for robust satisfaction of signal temporal logic specifications , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[4]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[5]  Ufuk Topcu,et al.  Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.

[6]  Sandipan Mishra,et al.  Advisory Temporal Logic Inference and Controller Design for Semiautonomous Robots , 2019, IEEE Transactions on Automation Science and Engineering.

[7]  Sheila A. McIlraith,et al.  Advice-Based Exploration in Model-Based Reinforcement Learning , 2018, Canadian Conference on AI.

[8]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[9]  Daniel Neider,et al.  Learning Linear Temporal Properties , 2018, 2018 Formal Methods in Computer Aided Design (FMCAD).

[10]  Yuhui Shi,et al.  Particle swarm optimization: developments, applications and resources , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[11]  Sheila A. McIlraith,et al.  Teaching Multiple Tasks to an RL Agent using LTL , 2018, AAMAS.

[12]  Joachim Diederich,et al.  Rule Extraction from Support Vector Machines , 2008, Studies in Computational Intelligence.

[13]  David W. Aha,et al.  Mixed Propositional Metric Temporal Logic: A New Formalism for Temporal Planning , 2016, AAAI Workshop: Planning for Hybrid Systems.

[14]  Amir Pnueli,et al.  The temporal logic of programs , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[15]  Calin Belta,et al.  A Decision Tree Approach to Data Classification using Signal Temporal Logic , 2016, HSCC.

[16]  Thomas A. Henzinger,et al.  The benefits of relaxing punctuality , 1991, PODC '91.

[17]  Jude W. Shavlik,et al.  Rule Extraction for Transfer Learning , 2008, Rule Extraction from Support Vector Machines.

[18]  Rajeev Alur,et al.  A Theory of Timed Automata , 1994, Theor. Comput. Sci..

[19]  J. van Leeuwen,et al.  Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[20]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[21]  Calin Belta,et al.  Temporal Logics for Learning and Detection of Anomalous Behavior , 2017, IEEE Transactions on Automatic Control.

[22]  Qiang Yang,et al.  Action-model acquisition for planning via transfer learning , 2014, Artif. Intell..

[23]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[24]  Ufuk Topcu,et al.  Learning from Demonstrations with High-Level Side Information , 2017, IJCAI.

[25]  A. Agung Julius,et al.  Census Signal Temporal Logic Inference for Multiagent Group Behavior Analysis , 2016, IEEE Transactions on Automation Science and Engineering.

[26]  Jude W. Shavlik,et al.  Policy Transfer via Markov Logic Networks , 2009, ILP.

[27]  Ufuk Topcu,et al.  Correct, Reactive, High-Level Robot Control , 2011, IEEE Robotics & Automation Magazine.

[28]  Raymond J. Mooney,et al.  Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.

[29]  Barbara König,et al.  On Timed Automata with Discrete Time - Structural and Language Theoretical Characterization , 2005, Developments in Language Theory.

[30]  Shen Li,et al.  Bayesian Inference of Temporal Task Specifications from Demonstrations , 2018, NeurIPS.

[31]  Joël Ouaknine,et al.  On the decidability of metric temporal logic , 2005, 20th Annual IEEE Symposium on Logic in Computer Science (LICS' 05).

[32]  Matthew E. Taylor,et al.  Improving Reinforcement Learning with Confidence-Based Demonstrations , 2017, IJCAI.

[33]  V. Glushkov THE ABSTRACT THEORY OF AUTOMATA , 1961 .

[34]  Ufuk Topcu,et al.  Information-Guided Temporal Logic Inference with Prior Knowledge , 2019, 2019 American Control Conference (ACC).

[35]  Sanjit A. Seshia,et al.  Learning Task Specifications from Demonstrations , 2017, NeurIPS.

[36]  Ieee Staff,et al.  2019 Formal Methods in Computer Aided Design, FMCAD 2019, San Jose, CA, USA, October 22-25, 2019 , 2019, Formal Methods in Computer-Aided Design.

[37]  Calin Belta,et al.  Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Jeffrey C. Trinkle,et al.  Robotics: Science and Systems , 2010, AI Mag..