Traceability in the Wild: Automatically Augmenting Incomplete Trace Links

Software and systems traceability is widely accepted as an essential element for supporting many software development tasks. Today's version control systems provide inbuilt features that allow developers to tag each commit with one or more issue ID, thereby providing the building blocks from which project-wide traceability can be established between feature requests, bug fixes, commits, source code, and specific developers. However, our analysis of six open source projects showed that on average only 60% of the commits were linked to specific issues. Without these fundamental links the entire set of project-wide links will be incomplete, and therefore not trustworthy. In this paper we address the fundamental problem of missing links between commits and issues. Our approach leverages a combination of process and text-related features characterizing issues and code changes to train a classifier to identify missing issue tags in commit messages, thereby generating the missing links. We conducted a series of experiments to evaluate our approach against six open source projects and showed that it was able to effectively recommend links for tagging issues at an average of 96% recall and 33% precision. In a related task for augmenting a set of existing trace links, the classifier returned precision at levels greater than 89% in all projects and recall of 50%.

[1]  Andrea De Lucia,et al.  Information Retrieval Methods for Automated Traceability Recovery , 2012, Software and Systems Traceability.

[2]  Gilbert Regan,et al.  A Traceability Process Assessment Model for the Medical Device Domain , 2014, EuroSPI.

[3]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[4]  LiGuo Huang,et al.  Can method data dependencies support the assessment of traceability between requirements and source code? , 2015, J. Softw. Evol. Process..

[5]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[6]  Andrea Zisman,et al.  Rule-based generation of requirements traceability relations , 2004, J. Syst. Softw..

[7]  Jane Huffman Hayes,et al.  Technique Integration for Requirements Assessment , 2007, 15th IEEE International Requirements Engineering Conference (RE 2007).

[8]  Gerardo Canfora,et al.  Estimating the number of remaining links in traceability recovery , 2016, Empirical Software Engineering.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Jane Cleland-Huang,et al.  A visual language for modeling and executing traceability queries , 2012, Software & Systems Modeling.

[11]  Abraham Bernstein,et al.  Software process data quality and characteristics: a historical view on open and closed source projects , 2009, IWPSE-Evol '09.

[12]  Jian Lu,et al.  Analyzing closeness of code dependencies for improving IR-based Traceability Recovery , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[13]  Jian Lv,et al.  Do data dependencies in source code complement call dependencies for understanding requirements traceability? , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  Jane Cleland-Huang,et al.  Foundations for an expert system in domain-specific traceability , 2013, 2013 21st IEEE International Requirements Engineering Conference (RE).

[16]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[17]  Alexander Egyed,et al.  Do developers benefit from requirements traceability when evolving and maintaining a software system? , 2014, Empirical Software Engineering.

[18]  Yi Zhang,et al.  Strategic Traceability for Safety-Critical Projects , 2013, IEEE Software.

[19]  Miryung Kim,et al.  An empirical study of supplementary bug fixes , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[20]  Jane Cleland-Huang,et al.  Guidelines for Benchmarking Automated Software Traceability Techniques , 2015, 2015 IEEE/ACM 8th International Symposium on Software and Systems Traceability.

[21]  Jane Cleland-Huang,et al.  Cold-Start Software Analytics , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[22]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[23]  Collin McMillan,et al.  Recommending source code for use in rapid software prototypes , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[24]  Patrick Mäder,et al.  Towards feature-aware retrieval of refinement traces , 2013, 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE).

[25]  Patrick Mäder,et al.  Mind the gap: assessing the conformance of software traceability to relevant guidelines , 2014, ICSE.

[26]  Patrick Mäder,et al.  A quality model for the systematic assessment of requirements traceability , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[27]  Genny Tortora,et al.  Enhancing an artefact management system with traceability recovery features , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[28]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[29]  Matthias Jarke,et al.  Toward Reference Models of Requirements Traceability , 2001, IEEE Trans. Software Eng..

[30]  Giuliano Antoniol,et al.  Traceability Fundamentals , 2012, Software and Systems Traceability.

[31]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[32]  Patrick Mäder,et al.  Empirical studies in software and systems traceability , 2017, Empirical Software Engineering.

[33]  Patrick Mäder,et al.  From Raw Project Data to Business Intelligence , 2015, IEEE Software.

[34]  Olly Gotel,et al.  An analysis of the requirements traceability problem , 1994, Proceedings of IEEE International Conference on Requirements Engineering.

[35]  Ghulam Rasool,et al.  Flexible design pattern detection based on feature types , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[36]  Gerardo Canfora,et al.  Impact analysis by mining software and change request repositories , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[37]  Patrick Mäder,et al.  Breaking the big-bang practice of traceability: Pushing timely trace recommendations to project stakeholders , 2012, 2012 20th IEEE International Requirements Engineering Conference (RE).

[38]  Harald C. Gall,et al.  Discovering Loners and Phantoms in Commit and Issue Data , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[39]  W. B. Cavnar,et al.  Using An N-Gram-Based Document Representation With A Vector Processing Retrieval Model , 1994, TREC.

[40]  Rainer Koschke,et al.  Locating Features in Source Code , 2003, IEEE Trans. Software Eng..

[41]  Alfred V. Aho,et al.  CERBERUS: Tracing Requirements to Source Code Using Information Retrieval, Dynamic Analysis, and Program Analysis , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[42]  Andrea De Lucia,et al.  On integrating orthogonal information retrieval methods to improve traceability recovery , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[43]  Tracy Hall,et al.  Filling the Gaps of Development Logs and Bug Issue Data , 2014, OpenSym.

[44]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[45]  Barbara Paech,et al.  Using Tags to Support Feature Management Across Issue Tracking Systems and Version Control Systems - A Research Preview , 2017, REFSQ.

[46]  Jane Cleland-Huang,et al.  Improving trace accuracy through data-driven configuration and composition of tracing features , 2013, ESEC/FSE 2013.

[47]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[48]  Patrick Mäder,et al.  Trace Queries for Safety Requirements in High Assurance Systems , 2012, REFSQ.

[49]  Jin L. C. Guo,et al.  Traceability in the Wild: Automatically Augmenting Incomplete Trace Links , 2019, SE/SWM.

[50]  Jan Bosch,et al.  Achieving traceability in large scale continuous integration and delivery deployment, usage and validation of the eiffel framework , 2016, Empirical Software Engineering.

[51]  Patrick Mäder,et al.  Preventing Defects: The Impact of Requirements Traceability Completeness on Software Quality , 2017, IEEE Transactions on Software Engineering.

[52]  Andreas Zeller,et al.  Where Should We Fix This Bug? A Two-Phase Recommendation Model , 2013, IEEE Transactions on Software Engineering.

[53]  Patrick Mäder,et al.  Software traceability: trends and future directions , 2014, FOSE.

[54]  Collin McMillan,et al.  When and How Using Structural Information to Improve IR-Based Traceability Recovery , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[55]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[56]  Jane Cleland-Huang,et al.  Semantically Enhanced Software Traceability Using Deep Learning Techniques , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[57]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[58]  Jane Huffman Hayes,et al.  Application of swarm techniques to requirements tracing , 2011, Requirements Engineering.