Improving dynamic software analysis by applying grammar inference principles

Grammar inference is a family of machine learning techniques that aim to infer grammars from a sample of sentences in some (unknown) language. Dynamic analysis is a family of techniques in the domain of software engineering that attempts to infer rules that govern the behaviour of software systems from a sample of executions. Despite their disparate domains, both fields have broadly similar aims; they try to infer rules that govern the behaviour of some unknown system from a sample of observations. Deriving general rules about program behaviour from dynamic analysis is difficult because it is virtually impossible to identify and supply a complete sample of necessary program executions. The problems that arise with incomplete input samples have been extensively investigated in the grammar inference community. This has resulted in a number of advances that have produced increasingly sophisticated solutions that are more successful at accurately inferring grammars from (potentially) sparse information about the underlying system. This paper investigates the similarities and shows how many of these advances can be applied with similar effect to dynamic analysis problems by a series of small experiments on random state machines. Copyright © 2008 John Wiley & Sons, Ltd.

[1]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[2]  Leslie G. Valiant,et al.  Cryptographic limitations on learning Boolean formulae and finite automata , 1994, JACM.

[3]  Hardi Hungar,et al.  Domain-Specific Optimization in Automata Learning , 2003, CAV.

[4]  J. Oncina,et al.  INFERRING REGULAR LANGUAGES IN POLYNOMIAL UPDATED TIME , 1992 .

[5]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[6]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[7]  Rajesh Parekh,et al.  Grammar Inference Automata Induction and Language Acquisition , 2005 .

[8]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[9]  David Lee,et al.  Principles and methods of testing finite state machines-a survey , 1996, Proc. IEEE.

[10]  Keqin Li,et al.  Integration Testing of Distributed Components Based on Learning Parameterized I/O Models , 2006, FORTE.

[11]  Michael D. Ernst Static and dynamic analysis: synergy and duality , 2003 .

[12]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[13]  Pierre Dupont,et al.  Incremental regular inference , 1996, ICGI.

[14]  Pedro García,et al.  A Comparative Study of Two Algorithms for Automata Identification , 2000, ICGI.

[15]  Noam Chomsky,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[16]  Barak A. Pearlmutter,et al.  Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[17]  Jerome A. Feldman,et al.  On the Synthesis of Finite-State Machines from Samples of Their Behavior , 1972, IEEE Transactions on Computers.

[18]  Kevin J. Lang Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[19]  Hod Lipson,et al.  Active Coevolutionary Learning of Deterministic Finite Automata , 2005, J. Mach. Learn. Res..

[20]  Steven P. Reiss,et al.  Encoding program executions , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[21]  Pierre Dupont,et al.  Generating annotated behavior models from end-user scenarios , 2005, IEEE Transactions on Software Engineering.

[22]  Siau-Cheng Khoo,et al.  SMArTIC: towards building an accurate, robust and scalable specification miner , 2006, SIGSOFT '06/FSE-14.

[23]  Pierre Dupont,et al.  THE QSM ALGORITHM AND ITS APPLICATION TO SOFTWARE BEHAVIOR MODEL INDUCTION , 2008, Appl. Artif. Intell..

[24]  Alexander L. Wolf,et al.  Discovering models of software processes from event-based data , 1998, TSEM.

[25]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[26]  Leonardo Mariani,et al.  Inferring state-based behavior models , 2006, WODA '06.

[27]  A. Nerode,et al.  Linear automaton transformations , 1958 .

[28]  Boris A. Trakhtenbrot,et al.  Finite automata : behavior and synthesis , 1973 .

[29]  Neil Walkinshaw,et al.  Automated discovery of state transitions and their functions in source code , 2008 .

[30]  Enrique Vidal,et al.  Identification of DFA: data-dependent vs data-independent algorithms , 1996, ICGI.

[31]  Robert Dale,et al.  Handbook of Natural Language Processing , 2001, Computational Linguistics.

[32]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[33]  Neil Walkinshaw,et al.  Reverse Engineering State Machines by Interactive Grammar Inference , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[34]  Enrique Vidal,et al.  Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Neil Walkinshaw,et al.  Automated discovery of state transitions and their functions in source code , 2008, Softw. Test. Verification Reliab..

[36]  Siau-Cheng Khoo,et al.  QUARK: Empirical Assessment of Automaton-based Specification Miners , 2006, 2006 13th Working Conference on Reverse Engineering.