Understanding the Contribution of Non-source Documents in Improving Missing Link Recovery: An Empirical Study

Background: Links between issue reports and their fixing commits play an important role in software maintenance. Such link data are often missing in practice and many approaches have been proposed in order to recover them automatically. Most of existing approaches focus on comparing log messages and source code files in commits with issues reports. Besides the two kinds of data in commits, non-source documents (NSDs) such as change logs usually record the fixing activities and sometimes share similar texts as those in issue reports. However, few discussions have been made on the role of NSDs in designing link recovery approaches. Aims: This paper aims at understanding whether and how NSDs affect the performance of link recovery approaches. Method: An empirical study is conducted to evaluate the role of NSDs in link recovery approaches in 18 open source projects with 6370 issues and 22761 commits. Results: With the inclusion of NSDs, link recovery approaches can get an average increase in F-Measure ranging from 2.76% - 25.63%. Further examinations show NSDs contribute to the performance improvement in 15 projects and have exceptions in 3 projects. The performance improvement in the 15 projects is mainly from the filtering of noisy links. On average, 23.59% - 76.30% false links can be excluded by exploiting NSDs in the link recovery approach. We also analyze the 3 projects in which NSDs cannot improve the performance. Our finding shows sophisticated data selection in NSDs is necessary. Conclusions: Our preliminary findings demonstrate that involving NSDs can improve the performance of link recovery approaches in most cases.

[1]  Thomas Zimmermann,et al.  When do changes induce fixes? On Fridays , 2005 .

[2]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[3]  Shinji Kusumoto,et al.  Hey! are you committing tangled changes? , 2014, ICPC 2014.

[4]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[5]  Yan Lindsay Sun,et al.  FRLink: Improving the recovery of missing issue-commit links by revisiting file relevance , 2017, Inf. Softw. Technol..

[6]  Alexander Serebrenik,et al.  Process Mining Software Repositories , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[7]  Danny Dig,et al.  How do centralized and distributed version control systems impact software changes? , 2014, ICSE.

[8]  Walid Maalej,et al.  Can development work describe itself? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[9]  Michael W. Godfrey,et al.  Release Pattern Discovery via Partitioning: Methodology and Case Study , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[10]  Andreas Zeller,et al.  The impact of tangled code changes , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[11]  E. Murphy-Hill,et al.  Refactoring Tools: Fitness for Purpose , 2006, IEEE Software.

[12]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[13]  Ashish Sureka,et al.  Nirikshan: mining bug report history for discovering process maps, inefficiencies and inconsistencies , 2014, ISEC '14.

[14]  Abraham Bernstein,et al.  Software process data quality and characteristics: a historical view on open and closed source projects , 2009, IWPSE-Evol '09.

[15]  Abraham Bernstein,et al.  LINKSTER: enabling efficient manual inspection and annotation of mined data , 2010, FSE '10.

[16]  Ahmed E. Hassan,et al.  A Case Study of Bias in Bug-Fix Datasets , 2010, 2010 17th Working Conference on Reverse Engineering.

[17]  Premkumar T. Devanbu,et al.  Sample size vs. bias in defect prediction , 2013, ESEC/FSE 2013.

[18]  David Lo,et al.  Empirical Evaluation of Bug Linking , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[19]  Ahmed E. Hassan,et al.  An empirical study of software release notes , 2015, Empirical Software Engineering.

[20]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[21]  David Lo,et al.  RCLinker: Automated Linking of Issue Reports and Commits Leveraging Rich Contextual Information , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[22]  Olga Baysal,et al.  Correlating Social Interactions to Release History during Software Evolution , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[23]  Liguo Yu Mining Change Logs and Release Notes to Understand Software Maintenance and Evolution , 2009, CLEI Electron. J..

[24]  Gabriele Bavota,et al.  Automatic generation of release notes , 2014, SIGSOFT FSE.

[25]  Anh Tuan Nguyen,et al.  Multi-layered approach for recovering links between bug reports and fixes , 2012, SIGSOFT FSE.

[26]  Stephen R. Schach,et al.  Open-Source Change Logs , 2004, Empirical Software Engineering.

[27]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[28]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[29]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.