On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery

We present an empirical study to statistically analyze the equivalence of several traceability recovery methods based on Information Retrieval (IR) techniques. The analysis is based on Principal Component Analysis and on the analysis of the overlap of the set of candidate links provided by each method. The studied techniques are the Jensen-Shannon (JS) method, Vector Space Model (VSM), Latent Semantic Indexing (LSI), and Latent Dirichlet Allocation (LDA). The results show that while JS, VSM, and LSI are almost equivalent, LDA is able to capture a dimension unique to the set of techniques which we considered.

[1]  Arie van Deursen,et al.  Can LSI help reconstructing requirements traceability in design and test? , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[2]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[3]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[4]  Mordechai Nisenson,et al.  A Traceability Technique for Specifications , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[5]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[6]  Giuliano Antoniol,et al.  Identifying the starting impact set of a maintenance request: a case study , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[7]  Santonu Sarkar,et al.  Mining business topics in source code using latent dirichlet allocation , 2008, ISEC '08.

[8]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[9]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[10]  Andrea De Lucia,et al.  Incremental Approach and User Feedbacks: a Silver Bullet for Traceability Recovery , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[11]  Andrea De Lucia,et al.  Traceability Recovery Using Numerical Analysis , 2009, 2009 16th Working Conference on Reverse Engineering.

[12]  Tibor Gyimóthy,et al.  Modeling class cohesion as mixtures of latent topics , 2009, 2009 IEEE International Conference on Software Maintenance.

[13]  Denys Poshyvanyk,et al.  Using Latent Dirichlet Allocation for automatic categorization of software , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[14]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[15]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[16]  Raffaella Settimi,et al.  Supporting software evolution through dynamically retrieving traces to UML artifacts , 2004, Proceedings. 7th International Workshop on Principles of Software Evolution, 2004..

[17]  Jane Cleland-Huang,et al.  Supporting software evolution through dynamically retrieving traces to UML artifacts , 2004 .

[18]  Jane Cleland-Huang,et al.  Utilizing supporting evidence to improve dynamic requirements traceability , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[19]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.

[20]  Michael W. Godfrey,et al.  What's hot and what's not: Windowed developer topic analysis , 2009, 2009 IEEE International Conference on Software Maintenance.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Sushil Krishna Bajracharya,et al.  A theory of aspects as latent topics , 2008, OOPSLA.

[23]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[24]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[25]  Collin McMillan,et al.  Combining textual and structural analysis of software artifacts for traceability link recovery , 2009, 2009 ICSE Workshop on Traceability in Emerging Forms of Software Engineering.

[26]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[27]  J. Cullum,et al.  Lanczos algorithms for large symmetric eigenvalue computations , 1985 .

[28]  Arie van Deursen,et al.  An industrial case study in reconstructing requirements views , 2008, Empirical Software Engineering.