Supporting change request assignment in open source development

Software repositories, such as CVS and Bugzilla, provide a huge amount of data regarding, respectively, source code and change request history. In this paper we propose a study on how change requests have been assigned to developers involved in an open source project and a method to suggest the set of best candidate developers to resolve a new change request. The method is based on the hypothesis that, given a new change request, developers that have resolved similar change requests in the past are the best candidates to resolve the new one. The suggestion can be useful for project managers in order to choose the best candidate to resolve a particular change request and/or to construct a competence database of developers working on software projects. We use the textual description of change requests stored in software repositories to index developers as documents in an information retrieval system. An Information Retrieval method is then applied to retrieve the candidate developers using the textual description of a new change request as a query.Case and evaluation study of the analysis and the methods introduced in this paper has been conducted on two large open source projects, Mozilla and KDE.

[1]  Abraham Bookstein,et al.  Informetric distributions, part I: Unified overview , 1990, J. Am. Soc. Inf. Sci..

[2]  G. Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[3]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[4]  Annie T. T. Ying,et al.  Predicting source code changes by mining revision history , 2003 .

[5]  Thomas Kistler,et al.  WebL - A Programming Language for the Web , 1998, Comput. Networks.

[6]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[7]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[8]  Moshe Bar,et al.  Open Source Development with CVS , 1999 .

[9]  Amir Michail,et al.  Data mining library reuse patterns using generalized association rules , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[10]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[11]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[12]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[13]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[14]  Fabio Crestani,et al.  “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[15]  Craig MacDonald,et al.  Terrier Information Retrieval Platform , 2005, ECIR.

[16]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[17]  Abraham Bookstein Implications of ambiguity for scientometric measurement , 2001, J. Assoc. Inf. Sci. Technol..

[18]  Jane Greenberg,et al.  Open Source Software Development and Lotka's Law: Bibliometric Patterns in Programming , 2003, J. Assoc. Inf. Sci. Technol..