Clustering support for automated tracing

Automated trace tools dynamically generate links between various software artifacts such as requirements, design elements, code, test cases, and other less structured supplemental documents. Trace algorithms typically utilize information retrieval methods to compute similarity scores between pairs of artifacts. Results are returned to the user as a ranked set of candidate links, and the user is then required to evaluate the results through performing a top-down search through the list. Although clustering methods have previously been shown to improve the performance of information retrieval algorithms by increasing understandability of the results and minimizing human analysis effort, their usefulness in automated traceability tools has not yet been explored. This paper evaluates and compares the effectiveness of several existing clustering methods to support traceability; describes a technique for incorporating them into the automated traceability process; and proposes new techniques based on the concepts of theme cohesion and coupling to dynamically identify optimal clustering granularity and to detect cross-cutting concerns that would otherwise remain undetected by standard clustering algorithms. The benefits of utilizing clustering in automated trace retrieval are then evaluated through a case study

[1]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[2]  Gerald Kowalski,et al.  Information Retrieval Systems: Theory and Implementation , 1997 .

[3]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[4]  Shashi Shekhar,et al.  Clustering and Information Retrieval , 2011, Network Theory and Applications.

[5]  Oren Etzioni,et al.  Fast and Intuitive Clustering of Web Documents , 1997, KDD.

[6]  Oussama Ben Khadra,et al.  Goal-centric traceability for managing non-functional requirements , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[7]  Jane Cleland-Huang,et al.  Towards Automated Requirements Triage , 2007, 15th IEEE International Requirements Engineering Conference (RE 2007).

[8]  Jane Cleland-Huang,et al.  Poirot: A Distributed Tool Supporting Enterprise-Wide Automated Traceability , 2006, 14th IEEE International Requirements Engineering Conference (RE'06).

[9]  Anton Leuski,et al.  Evaluating document clustering for interactive information retrieval , 2001, CIKM '01.

[10]  Chuan Duan,et al.  A Clustering Technique for Early Detection of Dominant and Recessive Cross-Cutting Concerns , 2007, Early Aspects at ICSE: Workshops in Aspect-Oriented Requirements Engineering and Architecture Design (EARLYASPECTS'07).

[11]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[12]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[13]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[14]  Genny Tortora,et al.  ADAMS: advanced artefact management system , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[15]  Daniel M. Berry,et al.  AbstFinder, A Prototype Natural Language Text Abstraction Finder for Use in Requirements Elicitation , 1997, Automated Software Engineering.

[16]  Klaus Pohl,et al.  Adapting traceability environments to project-specific needs , 1998, CACM.

[17]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[18]  Weili Wu,et al.  Clustering and Information Retrieval (Network Theory and Applications) , 2003 .

[19]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[20]  Suzanne Robertson,et al.  Mastering the Requirements Process , 1999 .

[21]  Marc El-Bèze,et al.  A Clustering Method for Information Retrieval , 1999 .

[22]  Alexander Egyed,et al.  Identifying requirements conflicts and cooperation: how quality attributes and automated traceability can help , 2004, IEEE Software.

[23]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[24]  Sérgio Soares,et al.  Implementing distribution and persistence aspects with aspectJ , 2002, OOPSLA '02.

[25]  Stephen Clark,et al.  Best Practices for Automated Traceability , 2007, Computer.

[26]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[27]  Vipin Kumar,et al.  Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach , 2003, Clustering and Information Retrieval.

[28]  Andrian Marcus,et al.  Recovery of Traceability Links between Software Documentation and Source Code , 2005, Int. J. Softw. Eng. Knowl. Eng..

[29]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[30]  Olly Gotel,et al.  An analysis of the requirements traceability problem , 1994, Proceedings of IEEE International Conference on Requirements Engineering.