Bug characteristics in open source software

To design effective tools for detecting and recovering from software failures requires a deep understanding of software bug characteristics. We study software bug characteristics by sampling 2,060 real world bugs in three large, representative open-source projects—the Linux kernel, Mozilla, and Apache. We manually study these bugs in three dimensions—root causes, impacts, and components. We further study the correlation between categories in different dimensions, and the trend of different types of bugs. The findings include: (1) semantic bugs are the dominant root cause. As software evolves, semantic bugs increase, while memory-related bugs decrease, calling for more research effort to address semantic bugs; (2) the Linux kernel operating system (OS) has more concurrency bugs than its non-OS counterparts, suggesting more effort into detecting concurrency bugs in operating system code; and (3) reported security bugs are increasing, and the majority of them are caused by semantic bugs, suggesting more support to help developers diagnose and fix security bugs, especially semantic security bugs. In addition, to reduce the manual effort in building bug benchmarks for evaluating bug detection and diagnosis tools, we use machine learning techniques to classify 109,014 bugs automatically.

[1]  Ding Yuan,et al.  How do fixes become bugs? , 2011, ESEC/FSE '11.

[2]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[5]  Franz Wotawa,et al.  Impact analysis of SCRs using single and multi-label machine learning classification , 2010, ESEM '10.

[6]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[7]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[8]  Lucas D. Panjer Predicting Eclipse Bug Lifetimes , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[9]  Daniel M. Germán,et al.  Towards a simplification of the bug report form in eclipse , 2008, MSR '08.

[10]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[11]  Iulian Neamtiu,et al.  Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging , 2010, 2010 IEEE International Conference on Software Maintenance.

[12]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[13]  Calton Pu,et al.  Buffer overflows: attacks and defenses for the vulnerability of the decade , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[14]  Ravishankar K. Iyer,et al.  Characterization of linux kernel behavior under errors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[15]  Thomas Zimmermann,et al.  Automatic Identification of Bug-Introducing Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[16]  D HerbslebJames,et al.  Two case studies of open source software development , 2002 .

[17]  Elaine J. Weyuker,et al.  Collecting and categorizing software error data in an industrial environment , 2018, J. Syst. Softw..

[18]  Daniel M. Germán,et al.  An empirical study of fine-grained software modifications , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[19]  Elaine J. Weyuker,et al.  Comparing the effectiveness of several modeling methods for fault prediction , 2010, Empirical Software Engineering.

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  David Lo,et al.  Identifying Linux bug fixing patches , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[22]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[23]  Ram Chillarege,et al.  Defect type and its impact on the growth curve (software development) , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[24]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[25]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[26]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[27]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[28]  Thomas Zimmermann,et al.  Security Trend Analysis with CVE Topic Models , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[29]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[30]  Elaine J. Weyuker,et al.  On the use of calling structure information to improve fault prediction , 2011, Empirical Software Engineering.

[31]  Maurizio Pighin,et al.  An empirical analysis of fault persistence through software releases , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[32]  Yuanyuan Zhou,et al.  PRES: probabilistic replay with execution sketching on multiprocessors , 2009, SOSP '09.

[33]  YangJunfeng,et al.  An empirical study of operating systems errors , 2001 .

[34]  Christoph Treude,et al.  A comparative exploration of FreeBSD bug lifetimes , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[35]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[36]  Gina Venolia,et al.  The secret life of bugs: Going past the errors and omissions in software repositories , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[37]  Stuart E. Schechter,et al.  Milk or Wine: Does Software Security Improve with Age? , 2006, USENIX Security Symposium.

[38]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[39]  Shan Lu,et al.  Understanding, detecting and exposing concurrency bugs , 2008 .

[40]  Gerardo Canfora,et al.  Social interactions around cross-system bug fixings: the case of FreeBSD and OpenBSD , 2011, MSR '11.

[41]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[42]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[43]  Yann-Gaël Guéhéneuc,et al.  Design evolution metrics for defect prediction in object oriented systems , 2010, Empirical Software Engineering.

[44]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[45]  Ken-ichi Matsumoto,et al.  Predicting Re-opened Bugs: A Case Study on the Eclipse Project , 2010, 2010 17th Working Conference on Reverse Engineering.

[46]  Daniel J. Paulish,et al.  An empirical investigation of software fault distribution , 1993, [1993] Proceedings First International Software Metrics Symposium.

[47]  Karama Kanoun,et al.  Software Reliability Analysis of Three Successive Generations of a Switching System , 1994, EDCC.

[48]  CRISPIN COWAN,et al.  Software Security for Open-Source Systems , 2003, IEEE Secur. Priv..

[49]  Philip J. Guo,et al.  Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[50]  Thomas Zimmermann,et al.  Extraction of bug localization benchmarks from history , 2007, ASE.

[51]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[52]  Boris Beizer,et al.  Software testing techniques (2. ed.) , 1990 .

[53]  Ralph Johnson,et al.  Security on demand , 2010 .

[54]  Albert Endres,et al.  An analysis of errors and their causes in system programs , 1975, IEEE Transactions on Software Engineering.

[55]  Atif M. Memon GUI Testing: Pitfalls and Process , 2002, Computer.

[56]  Tao Xie,et al.  Helping users avoid bugs in GUI applications , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[57]  Mark Sullivan,et al.  A comparison of software defects in database management systems and operating systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[58]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[59]  Yuanyuan Zhou,et al.  Have things changed now?: an empirical study of bug characteristics in modern open source software , 2006, ASID '06.

[60]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[61]  Ahmed E. Hassan,et al.  Security versus performance bugs: a case study on Firefox , 2011, MSR '11.

[62]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[63]  Vikram S. Adve,et al.  An empirical study of reported bugs in server software with implications for automated bug diagnosis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[64]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[65]  Brian N. Bershad,et al.  Improving the reliability of commodity operating systems , 2005, TOCS.

[66]  Tao Xie,et al.  Identifying security bug reports via text mining: An industrial case study , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[67]  Fabio Massacci,et al.  After-Life Vulnerabilities: A Study on Firefox Evolution, Its Vulnerabilities, and Fixes , 2011, ESSoS.

[68]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[69]  Iulian Neamtiu,et al.  Bug-fix time prediction models: can we do better? , 2011, MSR '11.

[70]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[71]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[72]  Yuanyuan Zhou,et al.  BugBench: Benchmarks for Evaluating Bug Detection Tools , 2005 .

[73]  Robert O. Hastings,et al.  Fast detection of memory leaks and access errors , 1991 .

[74]  Qi Gao,et al.  LeakSurvivor: Towards Safely Tolerating Memory Leaks for Garbage-Collected Languages , 2008, USENIX Annual Technical Conference.

[75]  Carol Withrow,et al.  Prediction and control of ADA software defects , 1990, J. Syst. Softw..

[76]  Oscar Nierstrasz,et al.  Assigning bug reports using a vocabulary-based expertise model of developers , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[77]  Robert L. Glass,et al.  Persistent Software Errors , 1981, IEEE Transactions on Software Engineering.

[78]  Serge Demeyer,et al.  Comparing Mining Algorithms for Predicting the Severity of a Reported Bug , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[79]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[80]  Christian Payne,et al.  On the security of open source software , 2002, Inf. Syst. J..

[81]  Boris Beizer,et al.  Software Testing Techniques , 1983 .

[82]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[83]  Elaine J. Weyuker,et al.  The distribution of faults in a large industrial software system , 2002, ISSTA '02.

[84]  Calton Pu,et al.  Buffer overflows: attacks and defenses for the vulnerability of the decade , 2000, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[85]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[86]  Yuanyuan Zhou,et al.  aComment: mining annotations from comments and code to detect interrupt related concurrency bugs , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[87]  Yuanyuan Zhou,et al.  CTrigger: exposing atomicity violation bugs from their hiding places , 2009, ASPLOS.

[88]  Juergen Rilling,et al.  Mining Bug Repositories--A Quality Assessment , 2008, 2008 International Conference on Computational Intelligence for Modelling Control & Automation.

[89]  Mark Sullivan,et al.  Software defects and their impact on system availability-a study of field failures in operating systems , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[90]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[91]  Brian Demsky,et al.  AFID: an automated approach to collecting software faults , 2010, Automated Software Engineering.

[92]  Seung-won Hwang,et al.  CosTriage: A Cost-Aware Triage Algorithm for Bug Reporting Systems , 2011, AAAI.