A Traceability Technique for Specifications

Traceability in software involves discovering links between different artifacts, and is useful for a myriad of tasks in the software life cycle. We compare several different Information Retrieval techniques for this task, across two datasets involving real-world software with the accompanying specifications and documentation. The techniques compared include dimensionality reduction methods, probabilistic and information theoretic approaches, and the standard vector space model.

[1]  Naftali Tishby,et al.  Sufficient Dimensionality Reduction , 2003, J. Mach. Learn. Res..

[2]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[5]  Denys Poshyvanyk,et al.  Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[6]  Letha H. Etzkorn,et al.  Automatically Identifying Reusable OO Legacy Code , 1997, Computer.

[7]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[8]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[9]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[10]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[11]  Kurt Mehlhorn,et al.  LEDA: A Library of Efficient Data Types and Algorithms , 1989, MFCS.

[12]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[13]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[14]  Michael Gutman,et al.  Asymptotically optimal classification for multiple tests with empirically observed statistics , 1989, IEEE Trans. Inf. Theory.

[15]  J. Cullum,et al.  Lanczos algorithms for large symmetric eigenvalue computations , 1985 .

[16]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[17]  Stefan Näher,et al.  LEDA: A Library of Efficient Data Types and Algorithms , 1989, STACS.