Mining Iterative Generators and Representative Rules for Software Specification Discovery

Billions of dollars are spent annually on software-related cost. It is estimated that up to 45 percent of software cost is due to the difficulty in understanding existing systems when performing maintenance tasks (i.e., adding features, removing bugs, etc.). One of the root causes is that software products often come with poor, incomplete, or even without any documented specifications. In an effort to improve program understanding, Lo et al. have proposed iterative pattern mining which outputs patterns that are repeated frequently within a program trace, or across multiple traces, or both. Frequent iterative patterns reflect frequent program behaviors that likely correspond to software specifications. To reduce the number of patterns and improve the efficiency of the algorithm, Lo et al. have also introduced mining closed iterative patterns, i.e., maximal patterns without any superpattern having the same support. In this paper, to technically deepen research on iterative pattern mining, we introduce mining iterative generators, i.e., minimal patterns without any subpattern having the same support. Iterative generators can be paired with closed patterns to produce a set of rules expressing forward, backward, and in-between temporal constraints among events in one general representation. We refer to these rules as representative rules. A comprehensive performance study shows the efficiency of our approach. A case study on traces of an industrial system shows how iterative generators and closed iterative patterns can be merged to form useful rules shedding light on software design.

[1]  Jinyan Li,et al.  Mining statistically important equivalence classes and delta-discriminative emerging patterns , 2007, KDD '07.

[2]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[3]  Siau-Cheng Khoo,et al.  Mining modal scenario-based specifications from execution traces of reactive systems , 2007, ASE '07.

[4]  Thomas A. Standish An Essay on Software Reuse , 1984, IEEE Transactions on Software Engineering.

[5]  David Wai-Lok Cheung,et al.  Mining periodic patterns with gap requirement from sequences , 2005, SIGMOD '05.

[6]  Chao Liu,et al.  Efficient Mining of Recurrent Rules from a Sequence Database , 2008, DASFAA.

[7]  Chao Liu,et al.  Mining past-time temporal rules from execution traces , 2008, WODA '08.

[8]  Jian Pei,et al.  Minimum Description Length Principle: Generators Are Preferable to Closed Patterns , 2006, AAAI.

[9]  David Harel,et al.  Come, let's play - scenario-based programming using LSCs and the play-engine , 2003 .

[10]  Manuvir Das,et al.  Perracotta: mining temporal API rules from imperfect traces , 2006, ICSE.

[11]  Lu Zhang,et al.  Early Filtering of Polluting Method Calls for Mining Temporal Specifications , 2008, 2008 15th Asia-Pacific Software Engineering Conference.

[12]  David Harel,et al.  Come, Let’s Play , 2003, Springer Berlin Heidelberg.

[13]  Marco Sinnema,et al.  Experiences in Software Product Families: Problems and Issues During Product Derivation , 2004, SPLC.

[14]  Meir M. Lehman,et al.  Program evolution: processes of software change , 1985 .

[15]  David Harel,et al.  LSCs: Breathing Life into Message Sequence Charts , 1999, Formal Methods Syst. Des..

[16]  Gemma C. Garriga Discovering Unbounded Episodes in Sequential Data , 2003, PKDD.

[17]  Edmund M. Clarke,et al.  Model Checking , 1999, Handbook of Automated Reasoning.

[18]  Leon J. Osterweil,et al.  Cecil: A Sequencing Constraint Language for Automatic Static Analysis Generation , 1990, IEEE Trans. Software Eng..

[19]  Myra Spiliopoulou,et al.  Managing Interesting Rules in Sequence Mining , 1999, PKDD.

[20]  Jianyong Wang,et al.  Efficient mining of frequent sequence generators , 2008, WWW.

[21]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[22]  Shengchao Qin,et al.  Verifying safety policies with size properties and alias controls , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[23]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[24]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[25]  Jia-Dong Ren,et al.  Mining Weighted Closed Sequential Patterns in Large Databases , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[26]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[27]  Robert V. Binder,et al.  Testing Object-Oriented Systems: Models, Patterns, and Tools , 1999 .

[28]  Jiawei Han,et al.  Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[29]  Sjouke Mauw,et al.  Message Sequence Chart (MSC) , 1996 .

[30]  Chao Liu,et al.  Efficient mining of iterative patterns for software specification discovery , 2007, KDD '07.

[31]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[32]  Abdelwahab Hamou-Lhadj,et al.  Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[33]  L. Erlikh,et al.  Leveraging legacy system dollars for e-business , 2000 .

[34]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[35]  David Lo,et al.  Mining Scenario-Based Triggers and Effects , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[36]  Amir Pnueli,et al.  Temporal Logic for Scenario-Based Specifications , 2005, TACAS.

[37]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[38]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.