Have things changed now?: an empirical study of bug characteristics in modern open source software

Software errors are a major cause for system failures. To effectively design tools and support for detecting and recovering from software failures requires a deep understanding of bug characteristics. Recently, software and its development process have significantly changed in many ways, including more help from bug detection tools, shift towards multi-threading architecture, the open-source development paradigm and increasing concerns about security and user-friendly interface. Therefore, results from previous studies may not be applicable to present software. Furthermore, many new aspects such as security, concurrency and open-source-related characteristics have not well studied. Additionally, previous studies were based on a small number of bugs, which may lead to non-representative results.To investigate the impacts of the new factors on software errors, we analyze bug characteristics by first sampling hundreds of real world bugs in two large, representative open-source projects. To validate the representativeness of our results, we use natural language text classification techniques and automatically analyze around 29, 000 bugs from the Bugzilla databases of the software.Our study has discovered several new interesting characteristics: (1) memory-related bugs have decreased because quite a few effective detection tools became available recently; (2) surprisingly, some simple memory-related bugs such as NULL pointer dereferences that should have been detected by existing tools in development are still a major component, which indicates that the tools have not been used with their full capacity; (3) semantic bugs are the dominant root causes, as they are application specific and difficult to fix, which suggests that more efforts should be put into detecting and fixing them; (4) security bugs are increasing, and the majority of them cause severe impacts.

[1]  Dan Roth,et al.  Part of Speech Tagging Using a Network of Linear Separators , 1998, ACL.

[2]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[3]  Ram Chillarege,et al.  Defect type and its impact on the growth curve (software development) , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[4]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[5]  Mark Sullivan,et al.  A comparison of software defects in database management systems and operating systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[6]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[7]  Yuanyuan Zhou,et al.  BugBench: Benchmarks for Evaluating Bug Detection Tools , 2005 .

[8]  Junfeng Yang,et al.  An empirical study of operating systems errors , 2001, SOSP.

[9]  Robert O. Hastings,et al.  Fast detection of memory leaks and access errors , 1991 .

[10]  Elaine J. Weyuker,et al.  Collecting and categorizing software error data in an industrial environment , 2018, J. Syst. Softw..

[11]  Sigrid Eldh Software Testing Techniques , 2007 .

[12]  Boris Beizer,et al.  Software testing techniques (2. ed.) , 1990 .

[13]  Crispan Cowan,et al.  StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks , 1998, USENIX Security Symposium.

[14]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[15]  Robert L. Glass,et al.  Persistent Software Errors , 1981, IEEE Transactions on Software Engineering.

[16]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[17]  Peter M. Chen,et al.  Whither generic recovery from application faults? A fault study using open-source software , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[18]  Jim Gray Why do Computer Stop and What Can be About it? , 1985, Büroautomation.

[19]  CRISPIN COWAN,et al.  Software Security for Open-Source Systems , 2003, IEEE Secur. Priv..

[20]  Atif M. Memon GUI Testing: Pitfalls and Process , 2002, Computer.

[21]  Christian Payne,et al.  On the security of open source software , 2002, Inf. Syst. J..

[22]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  Mark Sullivan,et al.  Software defects and their impact on system availability-a study of field failures in operating systems , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[25]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[26]  Gregg Rothermel,et al.  Test case prioritization: an empirical study , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[27]  YangJunfeng,et al.  An empirical study of operating systems errors , 2001 .

[28]  Albert Endres,et al.  An analysis of errors and their causes in system programs , 1975, IEEE Transactions on Software Engineering.

[29]  Elaine J. Weyuker,et al.  The distribution of faults in a large industrial software system , 2002, ISSTA '02.

[30]  Calton Pu,et al.  Buffer overflows: attacks and defenses for the vulnerability of the decade , 2000, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[31]  Ravishankar K. Iyer,et al.  Characterization of linux kernel behavior under errors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[32]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[33]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[34]  Nicholas Nethercote,et al.  Valgrind: A Program Supervision Framework , 2003, RV@CAV.