PRINS: Scalable Model Inference for Component-based System Logs

Behavioral software models play a key role in many software engineering tasks; unfortunately, these models either are not available during software development or, if available, quickly become outdated as implementations evolve. Model inference techniques have been proposed as a viable solution to extract finite state models from execution logs. However, existing techniques do not scale well when processing very large logs that can be commonly found in practice. In this paper, we address the scalability problem of inferring the model of a component-based system from large system logs, without requiring any extra information. Our model inference technique, called PRINS , follows a divide and conquer approach. The idea is to first infer a model of each system component from the corresponding logs; then, the individual component models are merged together taking into account the flow of events across components, as reflected in the logs. We evaluated PRINS in terms of scalability and accuracy, using nine datasets composed of logs extracted from publicly available benchmarks and a personal computer running desktop business applications. The results show that PRINS can process large logs much faster than a publicly available and well-known state-of-the-art tool, without significantly compromising the accuracy of inferred models.

[1]  Pascal Bouvry,et al.  Management of an academic HPC cluster: The UL experience , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).

[2]  Zibin Zheng,et al.  Tools and Benchmarks for Automated Log Parsing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[3]  Yuriy Brun,et al.  Leveraging existing instrumentation to automatically infer invariant-constrained models , 2011, ESEC/FSE '11.

[4]  Pierre Dupont,et al.  Generating annotated behavior models from end-user scenarios , 2005, IEEE Transactions on Software Engineering.

[5]  Neil Walkinshaw,et al.  A framework for the competitive evaluation of model inference techniques , 2010, MIIT '10.

[6]  Carlo Ghezzi,et al.  Inferring software behavioral models with MapReduce , 2015, Sci. Comput. Program..

[7]  Abdelwahab Hamou-Lhadj,et al.  A systematic literature review on automated log abstraction techniques , 2020, Inf. Softw. Technol..

[8]  Stephan Merz,et al.  Model Checking , 2000 .

[9]  Yuriy Brun,et al.  Inferring models of concurrent systems from logs of their behavior with CSight , 2014, ICSE.

[10]  Kwang-Ting Cheng,et al.  Automatic Functional Test Generation Using The Extended Finite State Machine Model , 1993, 30th ACM/IEEE Design Automation Conference.

[11]  Boudewijn F. van Dongen,et al.  Component behavior discovery from software execution data , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[12]  Shahar Maoz,et al.  Size and Accuracy in Model Inference , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13]  David Lo,et al.  Scalable Parallelization of Specification Mining Using Distributed Computing , 2015, The Art and Science of Analyzing Software Data.

[14]  Gabriele Bavota,et al.  Software Documentation Issues Unveiled , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[15]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[16]  Sandeep Kumar,et al.  Mining message sequence graphs , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[17]  Zibin Zheng,et al.  Drain: An Online Log Parsing Approach with Fixed Depth Tree , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[18]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[19]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[20]  Sandeep Kumar,et al.  Inferring class level specifications for distributed systems , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[21]  Yuriy Brun,et al.  Using Declarative Specification to Improve the Understanding, Extensibility, and Comparison of Model-Inference Algorithms , 2015, IEEE Transactions on Software Engineering.

[22]  John Derrick,et al.  Inferring extended finite state machine models from software executions , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[23]  J. D. Palmer,et al.  Documentation as a cross-cutting concern of software , 2019, SIGDOC.

[24]  Leonardo Mariani,et al.  GK-Tail+ An Efficient Approach to Learn Software Models , 2017, IEEE Transactions on Software Engineering.

[25]  Annibale Panichella,et al.  A Search-Based Approach for Accurate Identification of Log Message Formats , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[26]  Shilin He,et al.  Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics , 2020, ArXiv.

[27]  Alexander L. Wolf,et al.  Discovering models of software processes from event-based data , 1998, TSEM.

[28]  James Miller,et al.  Inferring Extended Probabilistic Finite-State Automaton Models from Software Executions , 2018, ACM Trans. Softw. Eng. Methodol..

[29]  Neil Walkinshaw,et al.  STAMINA: a competition to encourage the development and assessment of software model inference techniques , 2012, Empirical Software Engineering.

[30]  Jerome A. Feldman,et al.  On the Synthesis of Finite-State Machines from Samples of Their Behavior , 1972, IEEE Transactions on Computers.

[31]  Hernán Astudillo,et al.  Hearing the Voice of Software Practitioners on Causes, Effects, and Practices to Deal with Documentation Debt , 2020, REFSQ.

[32]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[33]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[34]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[35]  Barak A. Pearlmutter,et al.  Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[36]  Gordon Fraser,et al.  Behaviourally Adequate Software Testing , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.