论文信息 - L*-Based Learning of Markov Decision Processes (Extended Version)

L*-Based Learning of Markov Decision Processes (Extended Version)

Automata learning techniques automatically generate system models from test observations. These techniques usually fall into two categories: passive and active. Passive learning uses a predetermined data set, e.g., system logs. In contrast, active learning actively queries the system under learning, which is considered more efficient. An influential active learning technique is Angluin's L* algorithm for regular languages which inspired several generalisations from DFAs to other automata-based modelling formalisms. In this work, we study L*-based learning of deterministic Markov decision processes, first assuming an ideal setting with perfect information. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling system traces via testing. Experiments with the implementation of our sampling-based algorithm suggest that it achieves better accuracy than state-of-the-art passive learning techniques with the same amount of test data. Unlike existing learning algorithms with predefined states, our algorithm learns the complete model structure including the states.

Kim G. Larsen | Bernhard K. Aichernig | Giovanni Bacci | Martin Tappler | Maria Eichlseder

[1] Francesco Bergadano,et al. Learning Behaviors of Automata from Multiplicity and Equivalence Queries , 1996, SIAM J. Comput..

[2] Kim G. Larsen,et al. The BisimDist Library: Efficient Computation of Bisimilarity Distances for Markovian Models , 2013, QEST.

[3] Bengt Jonsson,et al. Active learning for extended finite state machines , 2016, Formal Aspects of Computing.

[4] Roland Groz,et al. Inferring Mealy Machines , 2009, FM.

[5] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[6] Marta Z. Kwiatkowska,et al. Automated Verification Techniques for Probabilistic Systems , 2011, SFM.

[7] Ufuk Topcu,et al. Probably Approximately Correct MDP Learning and Control With Temporal Logic Constraints , 2014, Robotics: Science and Systems.

[8] Marta Z. Kwiatkowska,et al. PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.

[9] Jan Tretmans,et al. Test Generation with Inputs, Outputs and Repetitive Quiescence , 1996, Softw. Concepts Tools.

[10] Bernhard K. Aichernig,et al. Efficient Active Automata Learning via Mutation Testing , 2018, Journal of Automated Reasoning.

[11] Marta Z. Kwiatkowska,et al. Analysis of a gossip protocol in PRISM , 2008, PERV.

[12] Jan Tretmans,et al. Model Based Testing with Labelled Transition Systems , 2008, Formal Methods and Testing.

[13] Frits W. Vaandrager,et al. Model learning , 2017, Commun. ACM.

[14] Jan Tretmans,et al. Approximate Active Learning of Nondeterministic Input Output Transition Systems , 2015, Electron. Commun. Eur. Assoc. Softw. Sci. Technol..

[15] A. Nerode,et al. Linear automaton transformations , 1958 .

[16] Mark Harman,et al. Formal Methods and Testing, An Outcome of the FORTEST Network, Revised Selected Papers , 2008, Formal Methods and Testing.

[17] José Oncina,et al. Learning deterministic regular grammars from stochastic samples in polynomial time , 1999, RAIRO Theor. Informatics Appl..

[18] Bernhard K. Aichernig,et al. Model-Based Testing IoT Communication via Active Automata Learning , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[19] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[20] Wen-Guey Tzeng,et al. Learning Probabilistic Automata and Markov Chains via Queries , 1992, Machine Learning.

[21] Tiziana Margaria,et al. Efficient test-based model generation for legacy reactive systems , 2004, Proceedings. Ninth IEEE International High-Level Design Validation and Test Workshop (IEEE Cat. No.04EX940).

[22] Yingke Chen,et al. Active Learning of Markov Decision Processes for System Verification , 2012, 2012 11th International Conference on Machine Learning and Applications.

[23] Kim G. Larsen,et al. Learning Probabilistic Automata for Model Checking , 2011, 2011 Eighth International Conference on Quantitative Evaluation of SysTems.

[24] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[25] Bernhard K. Aichernig,et al. Model Learning and Model-Based Testing , 2018, Machine Learning for Dynamic Software Analysis.

[26] Lu Feng,et al. Learning-Based Compositional Verification for Synchronous Probabilistic Systems , 2011, ATVA.

[27] Carlo Ghezzi,et al. Mining behavior models from user-intensive web applications , 2014, ICSE.

[28] José Oncina,et al. Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[29] Ronald L. Rivest,et al. Inference of finite automata using homing sequences , 1989, STOC '89.

[30] Kim G. Larsen,et al. Computing Behavioral Distances, Compositionally , 2013, MFCS.

[31] Dana Angluin,et al. Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[32] Bernhard Steffen,et al. Active Automata Learning in Practice - An Annotated Bibliography of the Years 2011 to 2016 , 2018, Machine Learning for Dynamic Software Analysis.

[33] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[34] Bernhard K. Aichernig,et al. Learning Abstracted Non-deterministic Finite State Machines , 2020, ICTSS.

[35] Christel Baier,et al. Principles of model checking , 2008 .