Fine grained indexing of software repositories to support impact analysis

Versioned and bug-tracked software systems provide a huge amount of historical data regarding source code changes and issues management. In this paper we deal with impact analysis of a change request and show that data stored in software repositories are a good descriptor on how past change requests have been resolved. A fine grained analysis method of software repositories is used to index code at different levels of granularity, such as lines of code and source files, with free text contained in software repositories. The method exploits information retrieval algorithms to link the change request description and code entities impacted by similar past change requests. We evaluate such approach on a set of three open-source projects.

[1]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[2]  Will Venters,et al.  Software engineering: theory and practice , 2006 .

[3]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[4]  Fabio Crestani,et al.  “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[5]  Moshe Bar,et al.  Open Source Development with CVS , 1999 .

[6]  Katsuhiko Gondow,et al.  Toward mining "concept keywords" from identifiers in large software projects , 2005, MSR.

[7]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[8]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[9]  Mikael Lindvall,et al.  How well do experienced software developers predict software change? , 1998, J. Syst. Softw..

[10]  Thomas Zimmermann,et al.  Preprocessing CVS Data for Fine-Grained Analysis , 2004, MSR.

[11]  Eugene W. Myers,et al.  A file comparison program , 1985, Softw. Pract. Exp..

[12]  Mariam Kamkar An overview and comparative classification of program slicing techniques , 1995, J. Syst. Softw..

[13]  Richard C. Holt,et al.  Predicting change propagation in software systems , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[14]  Gerardo Canfora,et al.  Impact analysis by mining software and change request repositories , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[15]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[16]  Shawn A. Bohner,et al.  Impact analysis-Towards a framework for comparison , 1993, 1993 Conference on Software Maintenance.

[17]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[18]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[19]  Qing Zhang,et al.  CVSSearch: searching through source code using CVS comments , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[20]  Annie T. T. Ying,et al.  Predicting source code changes by mining revision history , 2003 .

[21]  James L. Wright,et al.  Source code that talks: an exploration of Eclipse task comments and their implication to repository mining , 2005, ACM SIGSOFT Softw. Eng. Notes.

[22]  Rodolfo Alfredo Bertone,et al.  Software engineering: Theory and practice, 2nd Edition. Shari Lawrence Pfleeger. Prentice Hall, 2001 , 2005 .

[23]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .