Improving automated documentation to code traceability by combining retrieval techniques

Documentation written in natural language and source code are two of the major artifacts of a software system. Tracking a variety of traceability links between software documentation and source code assists software developers in comprehension, efficient development, and effective management of a system. Automated traceability systems to date have been faced with a major open research challenge: how to extract these links with both high precision and high recall. In this paper we introduce an approach that combines three supporting techniques, Regular Expression, Key Phrases, and Clustering, with a Vector Space Model (VSM) to improve the performance of automated traceability between documents and source code. This combination approach takes advantage of strengths of the three techniques to ameliorate limitations of VSM. Four case studies have been used to evaluate our combined technique approach. Experimental results indicate that our approach improves the performance of VSM, increases the precision of retrieved links, and recovers more true links than VSM alone.

[1]  AntoniolGiuliano,et al.  Recovering Traceability Links between Code and Documentation , 2002 .

[2]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[3]  Manu Konchady Building Search Applications: Lucene, Lingpipe, and Gate , 2008 .

[4]  Jane Cleland-Huang,et al.  Supporting software evolution through dynamically retrieving traces to UML artifacts , 2004 .

[5]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[6]  Jane Huffman Hayes,et al.  Improving requirements tracing via information retrieval , 2003, Proceedings. 11th IEEE International Requirements Engineering Conference, 2003..

[7]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[8]  Thomas A. Standish An Essay on Software Reuse , 1984, IEEE Transactions on Software Engineering.

[9]  Andrea Zisman,et al.  XTraQue: traceability for product line systems , 2009, Software & Systems Modeling.

[10]  Stefan Biffl,et al.  A value-based approach for understanding cost-benefit trade-offs during automated software traceability , 2005, TEFSE '05.

[11]  Alexander Egyed,et al.  A Scenario-Driven Approach to Trace Dependency Analysis , 2003, IEEE Trans. Software Eng..

[12]  Raffaella Settimi,et al.  Supporting software evolution through dynamically retrieving traces to UML artifacts , 2004, Proceedings. 7th International Workshop on Principles of Software Evolution, 2004..

[13]  Giuliano Antoniol,et al.  Traceability recovery by modeling programmer behavior , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[14]  René Witte,et al.  Traceability in Software Engineering – Past, Present and Future , 2007 .

[15]  Brian Lawler,et al.  Review of "Modernizing legacy systems: software technologies, engineering processes and business practices by Robert C. Seacord, Daniel Plakosh and Grace A. Lewis." Addison Wesley 2003 , 2004, SOEN.

[16]  Yonggang Zhang,et al.  Text mining and software engineering: an integrated source code and document analysis approach , 2008, IET Softw..

[17]  Grace A. Lewis,et al.  Modernizing Legacy Systems - Software Technologies, Engineering Processes, and Business Practices , 2003, SEI series in software engineering.

[18]  Alberto Bacchelli,et al.  Benchmarking Lightweight Techniques to Link E-Mails and Source Code , 2009, 2009 16th Working Conference on Reverse Engineering.

[19]  Mark Neal,et al.  Why and how of requirements tracing , 1994, IEEE Software.

[20]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[21]  Chao Liu,et al.  Recovering Relationships between Documentation and Source Code based on the Characteristics of Software Engineering , 2009, Electron. Notes Theor. Comput. Sci..

[22]  Romain Robbes,et al.  Linking e-mails and source code artifacts , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[23]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[24]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[25]  John Grundy,et al.  DCTracVis: a system retrieving and visualizing traceability links between source code and documentation , 2018, Automated Software Engineering.

[26]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[27]  Andrew Stranieri,et al.  Knowledge Discovery from Legal Databases , 2005 .

[28]  Giuliano Antoniol,et al.  Information retrieval models for recovering traceability links between code and documentation , 2000, Proceedings 2000 International Conference on Software Maintenance.

[29]  Olly Gotel,et al.  An analysis of the requirements traceability problem , 1994, Proceedings of IEEE International Conference on Requirements Engineering.

[30]  Dean Jin,et al.  Ontology-based software analysis and reengineering tool integration: the OASIS service-sharing methodology , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[31]  LuciaAndrea De,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007 .

[32]  Andrea Zisman,et al.  Supporting product line development through traceability , 2005, 12th Asia-Pacific Software Engineering Conference (APSEC'05).

[33]  Jane Cleland-Huang,et al.  Utilizing supporting evidence to improve dynamic requirements traceability , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).