Impact analysis by mining software and change request repositories

Impact analysis is the identification of the work products affected by a proposed change request, either a bug fix or a new feature request. In many open-source projects, such as KDE, Gnome, Mozilla, Openoffice, change requests, and related data, are stored in a bug tracking system such as Bugzilla. These data, together with the data stored in a versioning system, such as CVS, are a valuable source of information on which useful analyses can be performed. In this paper we propose a method to derive the set of source files impacted by a proposed change request. The method exploits information retrieval algorithms to link the change request description and the set of historical source file revisions impacted by similar past change requests. The method is evaluated by applying it on four open-source projects

[1]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[2]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[3]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[4]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[5]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[6]  Moshe Bar,et al.  Open Source Development with CVS , 1999 .

[7]  Amir Michail,et al.  Data mining library reuse patterns using generalized association rules , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[8]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[9]  Shawn A. Bohner,et al.  Impact analysis-Towards a framework for comparison , 1993, 1993 Conference on Software Maintenance.

[10]  Fabio Crestani,et al.  “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[11]  Jesús M. González-Barahona,et al.  Applying Social Network Analysis to the Information in CVS Repositories , 2004, MSR.

[12]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[13]  Annie T. T. Ying,et al.  Predicting source code changes by mining revision history , 2003 .

[14]  Will Venters,et al.  Software engineering: theory and practice , 2006 .

[15]  Richard C. Holt,et al.  Predicting change propagation in software systems , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[16]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[17]  Thomas Kistler,et al.  WebL - A Programming Language for the Web , 1998, Comput. Networks.

[18]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[19]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[20]  Mikael Lindvall,et al.  How well do experienced software developers predict software change? , 1998, J. Syst. Softw..

[21]  Mariam Kamkar,et al.  An overview and comparative classification of program slicing techniques , 1995, J. Syst. Softw..

[22]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .