Are Popular Classes More Defect Prone?

Traces of the evolution of software systems are left in a number of different repositories, such as configuration management systems, bug tracking systems, and mailing lists. Developers use e-mails to discuss issues ranging from low-level concerns (bug fixes, refactorings) to high-level resolutions (future planning, design decisions). Thus, e-mail archives constitute a valuable asset for understanding the evolutionary dynamics of a system. We introduce metrics that measure the “popularity” of source code artifacts, i.e. the amount of discussion they generate in e-mail archives, and investigate whether the information contained in e-mail archives is correlated to the defects found in the system. Our hypothesis is that developers discuss problematic entities more than unproblematic ones. We also study whether the precision of existing techniques for defect prediction can be improved using our popularity metrics.

[1]  D HerbslebJames,et al.  Two case studies of open source software development , 2002 .

[2]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[3]  Elaine J. Weyuker,et al.  Where the bugs are , 2004, ISSTA '04.

[4]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[5]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[6]  Elaine J. Weyuker,et al.  The distribution of faults in a large industrial software system , 2002, ISSTA '02.

[7]  Genny Tortora,et al.  Recovering design rationale from email repositories , 2009, 2009 IEEE International Conference on Software Maintenance.

[8]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[9]  Alberto Bacchelli,et al.  Benchmarking Lightweight Techniques to Link E-Mails and Source Code , 2009, 2009 16th Working Conference on Reverse Engineering.

[10]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[11]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[12]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[13]  Abraham Bernstein,et al.  Improving defect prediction using temporal features and non linear models , 2007, IWPSE '07.

[14]  Elaine J. Weyuker,et al.  Automating algorithms for the identification of fault-prone files , 2007, ISSTA '07.

[15]  Premkumar T. Devanbu,et al.  Open Borders? Immigration in Open Source Projects , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[16]  Niclas Ohlsson,et al.  Predicting Fault-Prone Software Modules in Telephone Switches , 1996, IEEE Trans. Software Eng..

[17]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[18]  Javam C. Machado,et al.  The prediction of faulty classes using object-oriented design metrics , 2001, J. Syst. Softw..

[19]  Ahmed E. Hassan,et al.  What Can OSS Mailing Lists Tell Us? A Preliminary Psychometric Text Analysis of the Apache Developer Mailing List , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[20]  Rudolf Ferenc,et al.  Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems , 2008, IEEE Transactions on Software Engineering.

[21]  Alberto Bacchelli,et al.  Miler - A Tool Infrastructure to Analyze Mailing Lists , 2009, FAMOOSr@WCRE.

[22]  M. Kendall Elementary Statistics , 1945, Nature.

[23]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[24]  Daniela E. Damian,et al.  Predicting build failures using social network analysis on developer communication , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[25]  Eleni Stroulia,et al.  A study on the current state of the art in tool-supported UML-based static reverse engineering , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[26]  Anita Sarma,et al.  Tesseract: Interactive visual exploration of socio-technical relationships in software development , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[27]  Mary Shaw,et al.  Finding predictors of field defects for open source software systems in commonly available data sources: a case study of OpenBSD , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[28]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[29]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[30]  Serge Demeyer,et al.  FAMIX 2. 1-the FAMOOS information exchange model , 1999 .

[31]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.

[32]  Premkumar T. Devanbu,et al.  Talk and work: a preliminary report , 2008, MSR '08.

[33]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[34]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[35]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[36]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[37]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.