On integrating orthogonal information retrieval methods to improve traceability recovery

Different Information Retrieval (IR) methods have been proposed to recover traceability links among software artifacts. Until now there is no single method that sensibly outperforms the others, however, it has been empirically shown that some methods recover different, yet complementary traceability links. In this paper, we exploit this empirical finding and propose an integrated approach to combine orthogonal IR techniques, which have been statistically shown to produce dissimilar results. Our approach combines the following IR-based methods: Vector Space Model (VSM), probabilistic Jensen and Shannon (JS) model, and Relational Topic Modeling (RTM), which has not been used in the context of traceability link recovery before. The empirical case study conducted on six software systems indicates that the integrated method outperforms stand-alone IR methods as well as any other combination of non-orthogonal methods with a statistically significant margin.

[1]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[2]  Raffaella Settimi,et al.  Supporting software evolution through dynamically retrieving traces to UML artifacts , 2004, Proceedings. 7th International Workshop on Principles of Software Evolution, 2004..

[3]  Collin McMillan,et al.  Combining textual and structural analysis of software artifacts for traceability link recovery , 2009, 2009 ICSE Workshop on Traceability in Emerging Forms of Software Engineering.

[4]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[7]  Diane K. Michelson,et al.  Applied Statistics for Engineers and Scientists , 2001, Technometrics.

[8]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[9]  DeursenArie,et al.  An industrial case study in reconstructing requirements views , 2008 .

[10]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.

[11]  Andrea De Lucia,et al.  On the role of the nouns in IR-based traceability recovery , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[12]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[13]  Giuliano Antoniol,et al.  Traceability recovery by modeling programmer behavior , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[14]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[15]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[16]  Andrea De Lucia,et al.  Incremental Approach and User Feedbacks: a Silver Bullet for Traceability Recovery , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[17]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[18]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[19]  Andrea De Lucia,et al.  On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[20]  Jane Cleland-Huang,et al.  Towards mining replacement queries for hard-to-retrieve traces , 2010, ASE.

[21]  Arie van Deursen,et al.  Can LSI help reconstructing requirements traceability in design and test? , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[22]  Genny Tortora,et al.  The role of the coverage analysis during IR-based traceability recovery: A controlled experiment , 2009, 2009 IEEE International Conference on Software Maintenance.

[23]  Jane Cleland-Huang,et al.  Utilizing supporting evidence to improve dynamic requirements traceability , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[24]  Tore Dybå,et al.  A systematic review of effect size in software engineering experiments , 2007, Inf. Softw. Technol..

[25]  Jane Cleland-Huang,et al.  A machine learning approach for tracing regulatory codes to product specific requirements , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[26]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[27]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[28]  LuciaAndrea De,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007 .

[29]  Arie van Deursen,et al.  An industrial case study in reconstructing requirements views , 2008, Empirical Software Engineering.

[30]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[31]  Matthias Jarke,et al.  Toward Reference Models of Requirements Traceability , 2001, IEEE Trans. Software Eng..

[32]  Mordechai Nisenson,et al.  A Traceability Technique for Specifications , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[33]  Andrian Marcus,et al.  Recovery of Traceability Links between Software Documentation and Source Code , 2005, Int. J. Softw. Eng. Knowl. Eng..

[34]  Denys Poshyvanyk,et al.  Using Relational Topic Models to capture coupling among classes in object-oriented software systems , 2010, 2010 IEEE International Conference on Software Maintenance.

[35]  Stephen Clark,et al.  Best Practices for Automated Traceability , 2007, Computer.