NEGWeb: Static defect detection via searching billions of lines of open source code

To find defects in programs, existing approaches mine programming rules as common patterns out of program source code and classify defects as violations of these mined programming rules. However, these existing approaches often cannot surface out many programming rules as common patterns because these approaches mine patterns from only one or a few project code bases. To better support static bug finding based on mining code, we develop a novel framework, called NEGWeb, for substantially expanding the mining scope to billions of lines of open source code based on a code search engine. NEGWeb detects violations related to neglected conditions around individual API calls. We evaluated NEGWeb to detect violations in local code bases or open source code bases. In our evaluation, we show that NEGWeb finds three real defects in Java code reported in the literature and also finds three previously unknown defects in a large-scale open source project called Columba (91, 508 lines of Java code) that reuses 2225 APIs. We also report a high percentage of real rules among the top 25 reported patterns mined for five popular open source applications.

[1]  Manuvir Das,et al.  Perracotta: mining temporal API rules from imperfect traces , 2006, ICSE.

[2]  Suresh Jagannathan,et al.  Path-Sensitive Inference of Function Precedence Protocols , 2007, 29th International Conference on Software Engineering (ICSE'07).

[3]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[4]  Jian Pei,et al.  MAPO: mining API usages from open source repositories , 2006, MSR '06.

[5]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[6]  Eran Yahav,et al.  Static Specification Mining Using Automata-Based Abstractions , 2008, IEEE Trans. Software Eng..

[7]  Jiong Yang,et al.  Finding what's not there: a new approach to revealing neglected conditions in software , 2007, ISSTA '07.

[8]  M. Lam,et al.  Tracking down software bugs using automatic anomaly detection , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[9]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[10]  Tao Xie,et al.  Mining Interface Specifications for Generating Checkable Robustness Properties , 2006, 2006 17th International Symposium on Software Reliability Engineering.

[11]  Chadd C. Williams,et al.  Recovering system specific rules from software repositories , 2005, MSR '05.

[12]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[13]  Benjamin Livshits,et al.  DynaMine: finding common error patterns by mining software revision histories , 2005, ESEC/FSE-13.

[14]  Jian Pei,et al.  Mining API patterns as partial orders from source code: from usage scenarios to specifications , 2007, ESEC-FSE '07.

[15]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[16]  Andreas Zeller,et al.  Detecting object usage anomalies , 2007, ESEC-FSE '07.