Clustering Support for Static Concept Location in Source Code

One of the most common comprehension activities undertaken by developers is concept location in source code. In the context of software change, concept location means finding locations in source code where changes are to be made in response to a modification request. Static techniques for concept location usually rely on searching the source code using textual information or on navigating the dependencies among software elements. In this paper we propose a novel static concept location technique, which leverages both the textual information present in the code and the structural dependencies between source code elements. The technique employs a textual search in that source code, which is clustered using the Border Flow algorithm, based on combining both structural and textual data. We evaluated the technique against a text search based baseline approach using data on almost 200 changes from five software systems. The results indicate that the new approach outperforms the baseline and that improvements are still possible.

[1]  Kiarash Mahdavi,et al.  Allowing Overlapping Boundaries in Source Code using a Search Based Approach to Concept Binding , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[2]  Giuseppe Scanniello,et al.  Architectural layer recovery for software system understanding and evolution , 2010, Softw. Pract. Exp..

[3]  Ted J. Biggerstaff,et al.  The concept assignment problem in program understanding , 1993, [1993] Proceedings Working Conference on Reverse Engineering.

[4]  Andrian Marcus,et al.  Static techniques for concept location in object-oriented code , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[5]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[6]  Axel-Cyrille Ngonga Ngomo,et al.  BorderFlow: A Local Graph Clustering Algorithm for Natural Language Processing , 2009, CICLing.

[7]  Václav Rajlich,et al.  JRipples: a tool for program comprehension during incremental change , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[8]  Emily Hill,et al.  Using natural language program analysis to locate and understand action-oriented concerns , 2007, AOSD.

[9]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[10]  Vaclav Rajlich,et al.  On use of dependency and semantics information in incremental change , 2009 .

[11]  Václav Rajlich,et al.  Incremental change in object-oriented programming , 2004, IEEE Software.

[12]  Andrian Marcus,et al.  Supporting program comprehension using semantic and structural information , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[13]  Alfred V. Aho,et al.  CERBERUS: Tracing Requirements to Source Code Using Information Retrieval, Dynamic Analysis, and Program Analysis , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[14]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[15]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[16]  Václav Rajlich,et al.  Case study of feature location using dependence graph , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[17]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[18]  Axel-Cyrille Ngonga Ngomo,et al.  Low-Bias Extraction of Domain-Specific Concepts , 2009, Informatica.

[19]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[20]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[21]  Florian Deißenböck,et al.  From Reality to Programs and (Not Quite) Back Again , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[22]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[23]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[24]  Gabriele Bavota,et al.  A two-step technique for extract class refactoring , 2010, ASE.

[25]  Denys Poshyvanyk,et al.  Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[26]  David B. Skillicorn,et al.  Automated Concept Location Using Independent Component Analysis , 2008, 2008 15th Working Conference on Reverse Engineering.

[27]  Wei Zhao,et al.  SNIAFL: towards a static non-interactive approach to feature location , 2004, Proceedings. 26th International Conference on Software Engineering.

[28]  Tim Menzies,et al.  On the use of relevance feedback in IR-based concept location , 2009, 2009 IEEE International Conference on Software Maintenance.

[29]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[30]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[31]  Emily Hill,et al.  Exploring the neighborhood with dora to expedite software maintenance , 2007, ASE '07.

[32]  Giuseppe Scanniello,et al.  A Probabilistic Based Approach towards Software System Clustering , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[33]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[34]  Letha H. Etzkorn,et al.  Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation , 2008, 2008 15th Working Conference on Reverse Engineering.

[35]  Bogdan Dit,et al.  Using Data Fusion and Web Mining to Support Feature Location in Software , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[36]  Gabriele Bavota,et al.  Software Re-Modularization Based on Structural and Semantic Metrics , 2010, 2010 17th Working Conference on Reverse Engineering.

[37]  Martin P. Robillard,et al.  Topology analysis of software dependencies , 2008, TSEM.

[38]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[39]  Giuseppe Scanniello,et al.  Using the Kleinberg Algorithm and Vector Space Model for Software System Clustering , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[40]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.