Exploring the use of labels to categorize issues in Open-Source Software projects

Reporting bugs, asking for new features and in general giving any kind of feedback is a common way to contribute to an Open-Source Software (OSS) project. This feedback is generally reported in the form of new issues for the project, managed by the so-called issue-trackers. One of the features provided by most issue-trackers is the possibility to define a set of labels/tags to classify the issues and, at least in theory, facilitate their management. Nevertheless, there is little empirical evidence to confirm that taking the time to categorize new issues has indeed a beneficial impact on the project evolution. In this paper we analyze a population of more than three million of GitHub projects and give some insights on how labels are used in them. Our preliminary results reveal that, even if the label mechanism is scarcely used, using labels favors the resolution of issues. Our analysis also suggests that not all projects use labels in the same way (e.g., for some labels are only a way to prioritize the project while others use them to signal their temporal evolution as they move along in the development workflow). Further research is needed to precisely characterize these label “families” and learn more the ideal application scenarios for each of them.

[1]  Daniel M. Oppenheimer,et al.  Correlated Averages vs. Averaged Correlations: Demonstrating the Warm Glow Heuristic Beyond Aggregation. , 2005 .

[2]  Premkumar T. Devanbu,et al.  Got Issues? Do New Features and Code Improvements Affect Defects? , 2011, 2011 18th Working Conference on Reverse Engineering.

[3]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[4]  S. Piantadosi,et al.  The ecological fallacy. , 1988, American journal of epidemiology.

[5]  Alexandre Bergel,et al.  GiLA: GitHub label analyzer , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[6]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[7]  Jordi Cabot,et al.  Cloning in DSLs: Experiments with OCL , 2011, SLE.

[8]  Premkumar T. Devanbu,et al.  Open Borders? Immigration in Open Source Projects , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[9]  Jacques Klein,et al.  Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[10]  Christoph Treude,et al.  Work Item Tagging: Communicating Concerns in Collaborative Software Development , 2012, IEEE Transactions on Software Engineering.

[11]  Christoph Treude,et al.  How tagging helps bridge the gap between social and technical aspects in software development , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[12]  Harald C. Gall,et al.  Predicting the fix time of bugs , 2010, RSSE '10.

[13]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[14]  Janice Singer,et al.  How Software Developers Use Tagging to Support Reminding and Refinding , 2009, IEEE Transactions on Software Engineering.

[15]  David Lo,et al.  Tag recommendation in software information sites , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[16]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[17]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..