Discovering Neglected Conditions in Software by Mining Dependence Graphs

Neglected conditions are an important but difficult-to-find class of software defects. This paper presents a novel approach to revealing neglected conditions that integrates static program analysis and advanced data mining techniques to discover implicit conditional rules in a code base and to discover rule violations that indicate neglected conditions. The approach requires the user to indicate minimal constraints on the context of the rules to be sought, rather than specific rule templates. To permit this generality, rules are modeled as graph minors of enhanced procedure dependence graphs (EPDGs), in which control and data dependence edges are augmented by edges representing shared data dependences. A heuristic maximal frequent subgraph mining algorithm is used to extract candidate rules from EPDGs, and a heuristic graph matching algorithm is used to identify rule violations. We also report the results of an empirical study in which the approach was applied to four open source projects (openssl, make, procmail, amaya). These results indicate that the approach is effective and reasonably efficient.

[1]  Paul D. Seymour,et al.  Graph minors. I. Excluding a forest , 1983, J. Comb. Theory, Ser. B.

[2]  Richard J. Lipton,et al.  Theoretical and empirical studies on using program mutation to test the functional correctness of programs , 1980, POPL '80.

[3]  Hong Zhu,et al.  Software unit test coverage and adequacy , 1997, ACM Comput. Surv..

[4]  Jian Pei,et al.  Mining API patterns as partial orders from source code: from usage scenarios to specifications , 2007, ESEC-FSE '07.

[5]  Eran Yahav,et al.  Static Specification Mining Using Automata-Based Abstractions , 2007, IEEE Transactions on Software Engineering.

[6]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[7]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[8]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[10]  Chadd C. Williams,et al.  Automatic mining of source code repositories to improve bug finding techniques , 2005, IEEE Transactions on Software Engineering.

[11]  Shijie Zhang,et al.  Monkey: Approximate Graph Mining Based on Spanning Trees , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Orna Kupferman,et al.  Coverage metrics for formal verification , 2003, International Journal on Software Tools for Technology Transfer.

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  T. A. Thayer,et al.  Software Reliability Study. , 1974 .

[15]  Giuseppe Di Fatta,et al.  Discriminative pattern mining in software fault detection , 2006, SOQUA '06.

[16]  Jiong Yang,et al.  Finding what's not there: a new approach to revealing neglected conditions in software , 2007, ISSTA '07.

[17]  Kamalakar Karlapalem,et al.  MARGIN: Maximal Frequent Subgraph Mining , 2006, ICDM.

[18]  Andreas Zeller,et al.  Detecting object usage anomalies , 2007, ESEC-FSE '07.

[19]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[20]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[21]  Suresh Jagannathan,et al.  Path-Sensitive Inference of Function Precedence Protocols , 2007, 29th International Conference on Software Engineering (ICSE'07).

[22]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[23]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[24]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[25]  Ram Chillarege,et al.  Orthogonal defect classification , 1996 .

[26]  Ian Sommerville,et al.  Viewpoints for requirements elicitation: a practical approach , 1998, Proceedings of IEEE International Symposium on Requirements Engineering: RE '98.

[27]  Suresh Jagannathan,et al.  Static specification inference using predicate mining , 2007, PLDI '07.

[28]  William E. Howden,et al.  Reliability of the Path Analysis Testing Strategy , 1976, IEEE Transactions on Software Engineering.

[29]  Marc Roper,et al.  Practical Code Inspection Techniques for Object-Oriented Systems: An Experimental Comparison , 2003, IEEE Softw..

[30]  Glenford J. Myers,et al.  A controlled experiment in program testing and code walkthroughs/inspections , 1978, CACM.

[31]  Lori A. Clarke,et al.  Partition Analysis: A Method Combining Testing and Verification , 1985, IEEE Transactions on Software Engineering.

[32]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[33]  Dawson R. Engler,et al.  How to write system-specific, static checkers in metal , 2002, PASTE '02.

[34]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[35]  Jeannette M. Wing,et al.  Model checking software systems: a case study , 1995, SIGSOFT FSE.

[36]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[37]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[38]  Benjamin Livshits,et al.  DynaMine: finding common error patterns by mining software revision histories , 2005, ESEC/FSE-13.

[39]  Michael E. Fagan Design and Code Inspections to Reduce Errors in Program Development , 1976, IBM Syst. J..

[40]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[41]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[42]  Stuart McClure,et al.  Hacking Exposed; Network Security Secrets and Solutions , 1999 .

[43]  David W. Binkley,et al.  Interprocedural slicing using dependence graphs , 1990, TOPL.

[44]  David Leon,et al.  Dex: a semantic-graph differencing tool for studying changes in large code bases , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[45]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.