HISTORICAL ANALYSIS OF MESSAGE CONTENTS TO RECOMMEND ISSUES TO OPEN SOURCE SOFTWARE CONTRIBUTORS

Developers of distributed open source projects make use of issue tracker tools to coordinate their work. These tools store valuable information, maintaining a log of relevant decisions and bug solutions. Finding the appropriate issues to contribute can be hard, as the high volume of data increases contributors’ overhead. This paper shows the importance of the content of issue tracker discussions in an open source project to build a classifier to predict the participation of a contributor in an issue. To design this prediction model, we used two machine learning algorithms called Naive Bayes and J48. We used data from the Apache Hadoop Commons project to evaluate the use of the algorithms. By applying machine learning algorithms to the ten most active contributors of this project, we achieved an average recall of 66.82% for Naive Bayes and 53.02% using J48. We achieved 64.31% of precision and 90.27% of accuracy using J48. We also conducted an exploratory study with five contributors that took part in fewer issues and achieved 77.41% of precision, 48% of recall, and 98.84% accuracy using J48 algorithm. The results indicate that the content of comments in issues of open source projects is a relevant factor to recommend issues to contributors.

[1]  Emad Shihab,et al.  An Exploration of Challenges Limiting Pragmatic Software Defect Prediction , 2012 .

[2]  Julita Vassileva,et al.  Recommendations in Online Discussion Forums for E-Learning Systems , 2010, IEEE Transactions on Learning Technologies.

[3]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[4]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[5]  Marco Aurélio Gerosa,et al.  Prediction of Developer Participation in Issues of Open Source Projects , 2012, 2012 Brazilian Symposium on Collaborative Systems.

[6]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[7]  Harvey P. Siy,et al.  Summarizing developer work history using time series segmentation: challenge report , 2008, MSR '08.

[8]  Michael Gertz,et al.  Expertise identification and visualization from CVS , 2008, MSR '08.

[9]  Eric S. Raymond,et al.  The Cathedral and the Bazaar , 2000 .

[10]  Oscar Nierstrasz,et al.  Assigning bug reports using a vocabulary-based expertise model of developers , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[11]  Gerardo Canfora,et al.  Who is going to mentor newcomers in open source projects? , 2012, SIGSOFT FSE.

[12]  R. Lathe Phd by thesis , 1988, Nature.

[13]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[14]  Gerardo Canfora,et al.  How Software Repositories can Help in Resolving a New Change Request , 2005 .

[15]  Christian Bird Predicting Email Response using Mined Data , 2007 .

[16]  Gail C. Murphy,et al.  Coping with an open bug repository , 2005, eclipse '05.

[17]  Ahmed E. Hassan,et al.  Should I contribute to this discussion? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[18]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[19]  Thomas Fritz,et al.  Does a programmer's activity indicate knowledge of code? , 2007, ESEC-FSE '07.

[20]  Audris Mockus,et al.  Expertise Browser: a quantitative approach to identifying expertise , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[21]  John Riedl,et al.  SuggestBot: using intelligent task routing to help people find work in wikipedia , 2007, IUI '07.