Mining version histories to guide software changes

We apply data mining to version histories in order to guide programmers along related changes: "Programmers who changed these functions also changed. . . ". Given a set of existing changes, such rules (a) suggest and predict likely further changes, (b) show up item coupling that is indetectable by program analysis, and (c) prevent errors due to incomplete changes. After an initial change, our ROSE prototype can correctly predict 26% of further files to be changed - and 15% of the precise functions or variables. The topmost three suggestions contain a correct location with a likelihood of 64%.

[1]  Qing Zhang,et al.  CVSSearch: searching through source code using CVS comments , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[2]  James M. Bieman,et al.  Understanding change-proneness in OO software through visualization , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[3]  David L. Atkins Version Sensitive Editing: Change History as a Programming Tool , 1998, SCM.

[4]  Eleni Stroulia,et al.  Data-mining in Support of Detecting Class Co-evolution , 2004, SEKE.

[5]  Gail C. Murphy,et al.  Hipikat: recommending pertinent software development artifacts , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[6]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[7]  Thomas Zimmermann,et al.  Preprocessing CVS Data for Fine-Grained Analysis , 2004, MSR.

[8]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[9]  Amir Michail,et al.  Data mining library reuse patterns in user-selected applications , 1999, 14th IEEE International Conference on Automated Software Engineering.

[10]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[11]  Carsten Görg,et al.  Detecting and visualizing refactorings from software archives , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[12]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[13]  Harald C. Gall,et al.  CVS release history data for detecting logical couplings , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[14]  Thomas Zimmermann,et al.  When do changes induce fixes? On Fridays , 2005 .

[15]  Harald C. Gall,et al.  Software evolution observations based on product release history , 1997, 1997 Proceedings International Conference on Software Maintenance.

[16]  Stan Matwin,et al.  Mining the Software Change Repository of a Legacy Telephony System , 2004, MSR.

[17]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[18]  Michael Burch,et al.  Visual data mining in software archives , 2005, SoftVis '05.

[19]  Stan Matwin,et al.  Supporting maintenance of legacy software with data mining techniques , 2000, CASCON.

[20]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[21]  Richard C. Holt,et al.  Predicting change propagation in software systems , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[22]  Amir Michail,et al.  Data mining library reuse patterns using generalized association rules , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[23]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[24]  Richard C. Holt,et al.  The chaos of software development , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[25]  Audris Mockus,et al.  Understanding and predicting effort in software projects , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[26]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[27]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[28]  Stan Matwin,et al.  Mining the maintenance history of a legacy software system , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[29]  Andreas Zeller,et al.  How history justifies system architecture (or not) , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..