A semantic relatedness approach for traceability link recovery

Human analysts working with automated tracing tools need to directly vet candidate traceability links in order to determine the true traceability information. Currently, human intervention happens at the end of the traceability process, after candidate traceability links have already been generated. This often leads to a decline in the results' accuracy. In this paper, we propose an approach, based on semantic relatedness (SR), which brings human judgment to an earlier stage of the tracing process by integrating it into the underlying retrieval mechanism. SR tries to mimic human mental model of relevance by considering a broad range of semantic relations, hence producing more semantically meaningful results. We evaluated our approach using three datasets from different application domains, and assessed the tracing results via six different performance measures concerning both result quality and browsability. The empirical evaluation results show that our SR approach achieves a significantly better performance in recovering true links than a standard Vector Space Model (VSM) in all datasets. Our approach also achieves a significantly better precision than Latent Semantic Indexing (LSI) in two of our datasets.

[1]  W. Marsden I and J , 2012 .

[2]  Alfred V. Aho,et al.  CERBERUS: Tracing Requirements to Source Code Using Information Retrieval, Dynamic Analysis, and Program Analysis , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[3]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[4]  Andrea De Lucia,et al.  Incremental Approach and User Feedbacks: a Silver Bullet for Traceability Recovery , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[5]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[6]  Michael Pucher WordNet-based Semantic Relatedness Measures in Automatic Speech Recognition for Meetings , 2007, ACL.

[7]  John C. Grundy,et al.  Improving automated documentation to code traceability by combining retrieval techniques , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[8]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[9]  Andrea De Lucia,et al.  On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[10]  Jane Cleland-Huang,et al.  Towards mining replacement queries for hard-to-retrieve traces , 2010, ASE.

[11]  Jane Huffman Hayes,et al.  Application of Swarm Techniques to Requirements Engineering: Requirements Tracing , 2010, 2010 18th IEEE International Requirements Engineering Conference.

[12]  Andrea De Lucia,et al.  On the role of the nouns in IR-based traceability recovery , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[13]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[14]  Jane Huffman Hayes,et al.  Automated Requirements Traceability: The Study of Human Analysts , 2010, 2010 18th IEEE International Requirements Engineering Conference.

[15]  Ning Yang,et al.  Semantic relatedness based on searching engines , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[16]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[17]  Carolyn Gifford Zull An Inquiry into Testing of Information Retrieval Systems. Comparative Systems Laboratory Final Technical Report, Part III: CSL Related Studies. , 1968 .

[18]  John Lane,et al.  IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries , 1991 .

[19]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[20]  Ted J. Biggerstaff,et al.  Program understanding and the concept assignment problem , 1994, CACM.

[21]  Andrea De Lucia,et al.  On integrating orthogonal information retrieval methods to improve traceability recovery , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[22]  Stephen Clark,et al.  Best Practices for Automated Traceability , 2007, Computer.

[23]  原田 秀逸 私の computer 環境 , 1998 .

[24]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[25]  Jane Cleland-Huang,et al.  A machine learning approach for tracing regulatory codes to product specific requirements , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[26]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[27]  Jane Huffman Hayes,et al.  A Framework for Comparing Requirements Tracing Experiments , 2005, Int. J. Softw. Eng. Knowl. Eng..

[28]  Hiroko Hagiwara,et al.  Semantic relatedness between words in each individual brain: An event-related potential study , 2011, Neuroscience Letters.

[29]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[30]  Jane Cleland-Huang,et al.  Utilizing supporting evidence to improve dynamic requirements traceability , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[31]  Olly Gotel,et al.  An analysis of the requirements traceability problem , 1994, Proceedings of IEEE International Conference on Requirements Engineering.

[32]  Jane Huffman Hayes,et al.  Humans in the traceability loop: can't live with 'em, can't live without 'em , 2005, TEFSE '05.

[33]  Nan Niu,et al.  TraCter: A tool for candidate traceability link clustering , 2011, 2011 IEEE 19th International Requirements Engineering Conference.

[34]  Collin McMillan,et al.  Combining textual and structural analysis of software artifacts for traceability link recovery , 2009, 2009 ICSE Workshop on Traceability in Emerging Forms of Software Engineering.

[35]  Lei Guo,et al.  Calculation of Relatedness by Using Search Results , 2011, 2011 3rd International Workshop on Intelligent Systems and Applications.

[36]  Jane Cleland-Huang,et al.  Improving automated requirements trace retrieval: a study of term-based enhancement methods , 2010, Empirical Software Engineering.

[37]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[38]  Iryna Gurevych,et al.  Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval , 2008, CLEF.

[39]  Andrea De Lucia,et al.  Improving Comprehensibility of Source Code via Traceability Information: a Controlled Experiment , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[40]  Nan Niu,et al.  Source code indexing for automated tracing , 2011, TEFSE '11.

[41]  Gilad Mishne,et al.  Using Wikipedia at the TREC QA Track , 2004, TREC.

[42]  Eduardo Mena,et al.  Web-Based Measure of Semantic Relatedness , 2008, WISE.

[43]  Andreas Stafylopatis,et al.  A Hybrid Web-Based Measure for Computing Semantic Relatedness Between Words , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[44]  Jane Huffman Hayes,et al.  Assessing traceability of software engineering artifacts , 2010, Requirements Engineering.

[45]  Alain April,et al.  REquirements TRacing On target (RETRO): improving software maintenance through traceability recovery , 2007, Innovations in Systems and Software Engineering.

[46]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[47]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[48]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[49]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[50]  Ted Pedersen,et al.  SenseRelate: : TargetWord-A Generalized Framework for Word Sense Disambiguation , 2005, ACL.

[51]  Jane Huffman Hayes,et al.  On human analyst performance in assisted requirements tracing: Statistical analysis , 2011, 2011 IEEE 19th International Requirements Engineering Conference.