Trustrace: Mining Software Repositories to Improve the Accuracy of Requirement Traceability Links

Traceability is the only means to ensure that the source code of a system is consistent with its requirements and that all and only the specified requirements have been implemented by developers. During software maintenance and evolution, requirement traceability links become obsolete because developers do not/cannot devote effort to updating them. Yet, recovering these traceability links later is a daunting and costly task for developers. Consequently, the literature has proposed methods, techniques, and tools to recover these traceability links semi-automatically or automatically. Among the proposed techniques, the literature showed that information retrieval (IR) techniques can automatically recover traceability links between free-text requirements and source code. However, IR techniques lack accuracy (precision and recall). In this paper, we show that mining software repositories and combining mined results with IR techniques can improve the accuracy (precision and recall) of IR techniques and we propose Trustrace, a trust--based traceability recovery approach. We apply Trustrace on four medium-size open-source systems to compare the accuracy of its traceability links with those recovered using state-of-the-art IR techniques from the literature, based on the Vector Space Model and Jensen-Shannon model. The results of Trustrace are up to 22.7 percent more precise and have 7.66 percent better recall values than those of the other techniques, on average. We thus show that mining software repositories and combining the mined data with existing results from IR techniques improves the precision and recall of requirement traceability links.

[1]  Samer Faraj,et al.  The Role of Intermediaries in the Development of Trust on the WWW: The Use and Prominence of Trusted Third Parties and Privacy Statements , 2006, J. Comput. Mediat. Commun..

[2]  Olly Gotel,et al.  An analysis of the requirements traceability problem , 1994, Proceedings of IEEE International Conference on Requirements Engineering.

[3]  Guosun Zeng,et al.  Using evidence based content trust model for spam detection , 2010, Expert Syst. Appl..

[4]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[5]  Andrea De Lucia,et al.  On integrating orthogonal information retrieval methods to improve traceability recovery , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[6]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[7]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.

[8]  R. J. Van Den Berg,et al.  Finding symbolons for cyberspace: Addressing the issues of trust in electronic commerce , 2001 .

[9]  Andrea De Lucia,et al.  Improving Source Code Lexicon via Traceability and Information Retrieval , 2011, IEEE Transactions on Software Engineering.

[10]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[11]  Yann-Gaël Guéhéneuc,et al.  Factors Impacting the Inputs of Traceability Recovery Approaches , 2012, Software and Systems Traceability.

[12]  Marios Koufaris,et al.  The development of initial trust in an online company by new customers , 2004, Inf. Manag..

[13]  Mordechai Nisenson,et al.  A Traceability Technique for Specifications , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[14]  Philip Ball,et al.  The missing links , 2000 .

[15]  Jane Huffman Hayes,et al.  Helping analysts trace requirements: an objective look , 2004, Proceedings. 12th IEEE International Requirements Engineering Conference, 2004..

[16]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.

[17]  Jane Cleland-Huang,et al.  Improving automated requirements trace retrieval: a study of term-based enhancement methods , 2010, Empirical Software Engineering.

[18]  Ilka Philippow,et al.  Enabling Automated Traceability Maintenance by Recognizing Development Activities Applied to Models , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[19]  Jonathan I. Maletic,et al.  Mining sequences of changed-files from version histories , 2006, MSR '06.

[20]  Yann-Gaël Guéhéneuc,et al.  Requirements Traceability for Object Oriented Systems by Partitioning Source Code , 2011, 2011 18th Working Conference on Reverse Engineering.

[21]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[22]  Jonathan I. Maletic,et al.  Software Repositories: A Source for Traceability Links , 2007 .

[23]  Yann-Gaël Guéhéneuc,et al.  Trust-Based Requirements Traceability , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[24]  Andrea De Lucia,et al.  Improving IR-based Traceability Recovery Using Smoothing Filters , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[25]  Alain April,et al.  REquirements TRacing On target (RETRO): improving software maintenance through traceability recovery , 2007, Innovations in Systems and Software Engineering.

[26]  Wei Zhao,et al.  SNIAFL: towards a static non-interactive approach to feature location , 2004, Proceedings. 26th International Conference on Software Engineering.

[27]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[28]  Yann-Gaël Guéhéneuc,et al.  PREREQIR: Recovering Pre-Requirements via Cluster Analysis , 2008, 2008 15th Working Conference on Reverse Engineering.

[29]  Jonathan I. Maletic,et al.  TQL: A query language to support traceability , 2009, 2009 ICSE Workshop on Traceability in Emerging Forms of Software Engineering.

[30]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[31]  Jane Huffman Hayes,et al.  A Framework for Comparing Requirements Tracing Experiments , 2005, Int. J. Softw. Eng. Knowl. Eng..

[32]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[33]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[34]  Yolanda Gil,et al.  A survey of trust in computer science and the Semantic Web , 2007, J. Web Semant..

[35]  Morris Sloman,et al.  A survey of trust in internet applications , 2000, IEEE Communications Surveys & Tutorials.

[36]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[37]  Charles J. Kacmar,et al.  The impact of initial consumer trust on intentions to transact with a web site: a trust building model , 2002, J. Strateg. Inf. Syst..

[38]  Jonathan I. Maletic,et al.  Mining software repositories for traceability links , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[39]  Alfred V. Aho,et al.  Do Crosscutting Concerns Cause Defects? , 2008, IEEE Transactions on Software Engineering.