On the role of the nouns in IR-based traceability recovery

The intensive human effort needed to manually manage traceability information has increased the interest in utilising semi-automated traceability recovery techniques. This paper presents a simple way to improve the accuracy of traceability recovery methods based on Information Retrieval techniques. The proposed method acts on the artefact indexing considering only the nouns contained in the artefact content to define the semantics of an artefact. The rationale behind such a choice is that the language used in software documents can be classified as a sectorial language, where the terms that provide more indication on the semantics of a document are the nouns. The results of a reported case study demonstrate that the proposed artefact indexing significantly improves the accuracy of traceability recovery methods based on the probabilistic or vector space based IR models.

[1]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[2]  Raffaella Settimi,et al.  Supporting software evolution through dynamically retrieving traces to UML artifacts , 2004, Proceedings. 7th International Workshop on Principles of Software Evolution, 2004..

[3]  Jane Cleland-Huang,et al.  Utilizing supporting evidence to improve dynamic requirements traceability , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[4]  Edward L. Keenan,et al.  Formal Semantics of Natural Language , 1975 .

[5]  Mordechai Nisenson,et al.  A Traceability Technique for Specifications , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[6]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[7]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[10]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[11]  Andrea De Lucia,et al.  Incremental Approach and User Feedbacks: a Silver Bullet for Traceability Recovery , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[12]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[13]  Genny Tortora,et al.  IR-Based Traceability Recovery Processes: An Empirical Comparison of "One-Shot" and Incremental Processes , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[14]  Genny Tortora,et al.  Assessing IR-based traceability recovery tools through controlled experiments , 2009, Empirical Software Engineering.

[15]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[16]  Diane K. Michelson,et al.  Applied Statistics for Engineers and Scientists , 2001, Technometrics.

[17]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[18]  Giuliano Antoniol,et al.  Traceability recovery by modeling programmer behavior , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[19]  Denys Poshyvanyk,et al.  When and how to visualize traceability links? , 2005, TEFSE '05.

[20]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[21]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[22]  Arie van Deursen,et al.  Can LSI help reconstructing requirements traceability in design and test? , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[23]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.