Recovering Software Specifications with Inductive Logic Programming

We consider using machine learning techniques to help understand a large software system. In particular, we describe how learning techniques can be used to reconstruct abstract Datalog specifications of a certain type of database software from examples of its operation. In a case study involving a large (more than one million lines of C) real-world software system, we demonstrate that off-the-shelf inductive logic programming methods can be successfully used for specification recovery; specifically, Grendel2 can extract specifications for about one-third of the modules in a test suite with high rates of precision and recall. We then describe two extensions to Grendel2 which improve performance on this task: one which allows it to output a set of candidate hypotheses, and another which allows it to output specifications containing determinations. In combination, these extensions enable specifications to be extracted for nearly two-thirds of the benchmark modules with perfect recall, and precision of better than 60%.

[1]  Thomas A. Corbi,et al.  Program Understanding: Challenge for the 1990s , 1989, IBM Syst. J..

[2]  BiasWilliam W. CohenAT,et al.  Rapid Prototyping of ILP Systems Using Explicit Bias , 1993 .

[3]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[4]  Harry M. Sneed,et al.  Reverse engineering programs via dynamic analysis , 1993, [1993] Proceedings Working Conference on Reverse Engineering.

[5]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[6]  Wojtek Kozaczynski,et al.  SRE: a knowledge-based environment for large-scale software re-engineering activities , 1989, ICSE '89.

[7]  Peter T. Breuer,et al.  Creating specifications from code: Reverse-engineering techniques , 1991, J. Softw. Maintenance Res. Pract..

[8]  Jean-Luc Hainaut,et al.  Contribution to a theory of database reverse engineering , 1993, [1993] Proceedings Working Conference on Reverse Engineering.

[9]  Linda M. Wills,et al.  Recognizing a program's design: a graph-parsing approach , 1990, IEEE Software.

[10]  Peter H. Aiken,et al.  A framework for reverse engineering DoD legacy information systems , 1993, [1993] Proceedings Working Conference on Reverse Engineering.

[11]  Ted J. Biggerstaff,et al.  Design recovery for maintenance and reuse , 1989, Computer.

[12]  William W. Cohen Compiling prior knowledge into an explicit basis , 1992, ICML 1992.

[13]  William J. Premerlani,et al.  An approach for reverse engineering of relational databases , 1993, [1993] Proceedings Working Conference on Reverse Engineering.

[14]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[15]  Leon Sterling,et al.  The Art of Prolog - Advanced Programming Techniques , 1986 .