Software Feature Request Detection in Issue Tracking Systems

Communication about requirements is often handled in issue tracking systems, especially in a distributed setting. As issue tracking systems also contain bug reports or programming tasks, the software feature requests of the users are often difficult to identify. This paper investigates natural language processing and machine learning features to detect software feature requests in natural language data of issue tracking systems. It compares traditional linguistic machine learning features, such as "bag of words", with more advanced features, such as subject-action-object, and evaluates combinations of machine learning features derived from the natural language and features taken from the issue tracking system meta-data. Our investigation shows that some combinations of machine learning features derived from natural language and the issue tracking system meta-data outperform traditional approaches. We show that issues or data fields (e.g. descriptions or comments), which contain software feature requests, can be identified reasonably well, but hardly the exact sentence. Finally, we show that the choice of machine learning algorithms should depend on the goal, e.g. maximization of the detection rate or balance between detection rate and precision. In addition, the paper contributes a double coded gold standard and an open-source implementation to further pursue this topic.

[1]  Walid Maalej,et al.  How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Reviews , 2014, 2014 IEEE 22nd International Requirements Engineering Conference (RE).

[2]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[3]  Alessandro Fantechi,et al.  A Content Analysis Technique for Inconsistency Detection in Software Requirements Documents , 2005, WER.

[4]  Zarinah Mohd Kasirun,et al.  Feature extraction approaches from natural language requirements for reuse in software product lines: A systematic literature review , 2015, J. Syst. Softw..

[5]  Sandeep K. Singh,et al.  An Automated approach for Bug Categorization using Fuzzy Logic , 2015, ISEC.

[6]  William N. Robinson,et al.  Two Rule-Based Natural Language Strategies for Requirements Discovery and Classification in Open Source Software Development Projects , 2012, J. Manag. Inf. Syst..

[7]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[8]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[9]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10]  Barbara Paech,et al.  Requirements Communication in Issue Tracking Systems in Four Open-Source Projects , 2015, REFSQ Workshops.

[11]  Peter Sawyer,et al.  The Case for Dumb Requirements Engineering Tools , 2012, REFSQ.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Ronen Feldman,et al.  The Text Mining Handbook: DIAL: A Dedicated Information Extraction Language for Text Mining , 2006 .

[14]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[15]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.

[16]  Daniela Cruzes,et al.  Impact of Stakeholder Type and Collaboration on Issue Resolution Time in OSS Projects , 2011, OSS.

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[19]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[20]  Walid Maalej,et al.  Bug report, feature request, or simply praise? On automatically classifying app reviews , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[21]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[22]  Barbara Paech,et al.  What are the Features of this Software , 2014, ICSEA 2014.

[23]  Barbara Paech,et al.  Classifying unstructured data into natural language text and technical information , 2014, MSR 2014.

[24]  Anthony Finkelstein,et al.  Early failure prediction in feature request management systems: an extended study , 2012, Requirements Engineering.

[25]  William N. Robinson,et al.  A Rule-Based Natural Language Technique for Requirements Discovery and Classification in Open-Source Software Development Projects , 2011, 2011 44th Hawaii International Conference on System Sciences.