Enhancing software artefact traceability recovery processes with link count information

Context: The intensive human effort needed to manually manage traceability information has increased the interest in using semi-automated traceability recovery techniques. In particular, Information Retrieval (IR) techniques have been largely employed in the last ten years to partially automate the traceability recovery process. Aim: Previous studies mainly focused on the analysis of the performances of IR-based traceability recovery methods and several enhancing strategies have been proposed to improve their accuracy. Very few papers investigate how developers (i) use IR-based traceability recovery tools and (ii) analyse the list of suggested links to validate correct links or discard false positives. We focus on this issue and suggest exploiting link count information in IR-based traceability recovery tools to improve the performances of the developers during a traceability recovery process. Method: Two empirical studies have been conducted to evaluate the usefulness of link count information. The two studies involved 135 University students that had to perform (with and without link count information) traceability recovery tasks on two software project repositories. Then, we evaluated the quality of the recovered traceability links in terms of links correctly and erroneously traced by the students. Results: The results achieved indicate that the use of link count information significantly increases the number of correct links identified by the participants. Conclusions: The results can be used to derive guidelines on how to effectively use traceability recovery approaches and tools proposed in the literature.

[1]  Jane Cleland-Huang,et al.  Evaluating the Use of Project Glossaries in Automated Trace Retrieval , 2008, Software Engineering Research and Practice.

[2]  David W. Binkley,et al.  Expanding identifiers to normalize source code vocabulary , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[3]  Jane Cleland-Huang,et al.  Utilizing supporting evidence to improve dynamic requirements traceability , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[4]  Genny Tortora,et al.  Adams re-trace , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[5]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[6]  Andrea De Lucia,et al.  Incremental Approach and User Feedbacks: a Silver Bullet for Traceability Recovery , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[7]  Jane Huffman Hayes,et al.  Improving requirements tracing via information retrieval , 2003, Proceedings. 11th IEEE International Requirements Engineering Conference, 2003..

[8]  Arie van Deursen,et al.  An industrial case study in reconstructing requirements views , 2008, Empirical Software Engineering.

[9]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[10]  Gabriele Bavota,et al.  TraceME: Traceability Management in Eclipse , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[11]  Andrea De Lucia,et al.  On integrating orthogonal information retrieval methods to improve traceability recovery , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[12]  Yann-Gaël Guéhéneuc,et al.  TIDIER: an identifier splitting approach using speech recognition techniques , 2013, J. Softw. Evol. Process..

[13]  Andrea De Lucia,et al.  Fine-grained management of software artefacts: the ADAMS system , 2010 .

[14]  Andrea De Lucia,et al.  On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[15]  Jane Cleland-Huang,et al.  Towards mining replacement queries for hard-to-retrieve traces , 2010, ASE.

[16]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[17]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[18]  Andrea De Lucia,et al.  Traceability Recovery Using Numerical Analysis , 2009, 2009 16th Working Conference on Reverse Engineering.

[19]  Andrea De Lucia,et al.  Improving IR-based Traceability Recovery Using Smoothing Filters , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[20]  Yann-Gaël Guéhéneuc,et al.  Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[21]  Jane Cleland-Huang,et al.  Improving automated requirements trace retrieval: a study of term-based enhancement methods , 2010, Empirical Software Engineering.

[22]  Mordechai Nisenson,et al.  A Traceability Technique for Specifications , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[23]  Andrea De Lucia,et al.  Applying a smoothing filter to improve IR-based traceability recovery processes: An empirical investigation , 2013, Inf. Softw. Technol..

[24]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[25]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[26]  David W. Binkley,et al.  Normalizing Source Code Vocabulary , 2010, 2010 17th Working Conference on Reverse Engineering.

[27]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[28]  Genny Tortora,et al.  IR-Based Traceability Recovery Processes: An Empirical Comparison of "One-Shot" and Incremental Processes , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[29]  Giuliano Antoniol,et al.  Identifying the starting impact set of a maintenance request: a case study , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[30]  Jane Cleland-Huang,et al.  Supporting software evolution through dynamically retrieving traces to UML artifacts , 2004 .

[31]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[32]  Jane Huffman Hayes,et al.  Improving after-the-fact tracing and mapping: supporting software quality predictions , 2005, IEEE Software.

[33]  Jane Cleland-Huang,et al.  Phrasing in Dynamic Requirements Trace Retrieva , 2006, 30th Annual International Computer Software and Applications Conference (COMPSAC'06).

[34]  Alexander Egyed,et al.  Code patterns for automatically validating requirements-to-code traces , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[35]  Jane Huffman Hayes,et al.  Tracing requirements to defect reports: an application of information retrieval techniques , 2005, Innovations in Systems and Software Engineering.

[36]  Andrea De Lucia,et al.  On the role of the nouns in IR-based traceability recovery , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[37]  Genny Tortora,et al.  Assessing IR-based traceability recovery tools through controlled experiments , 2009, Empirical Software Engineering.

[38]  Jane Cleland-Huang,et al.  A machine learning approach for tracing regulatory codes to product specific requirements , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[39]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[40]  Olly Gotel,et al.  An analysis of the requirements traceability problem , 1994, Proceedings of IEEE International Conference on Requirements Engineering.

[41]  Per Runeson,et al.  Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability , 2013, Empirical Software Engineering.

[42]  Jane Huffman Hayes,et al.  How do we trace requirements: an initial study of analyst behavior in trace validation tasks , 2011, CHASE.

[43]  Emily Hill,et al.  Mining source code to automatically split identifiers for software analysis , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[44]  Stephen Clark,et al.  Best Practices for Automated Traceability , 2007, Computer.

[45]  R. Mark Sirkin,et al.  Statistics for the Social Sciences , 1994 .

[46]  J. Devore,et al.  Applied statistics for engineers and scientists , 1994 .

[47]  Genny Tortora,et al.  The role of the coverage analysis during IR-based traceability recovery: A controlled experiment , 2009, 2009 IEEE International Conference on Software Maintenance.